Image Denoising Using Nonlocal Regularized Deep Image Prior †

: Deep neural networks have shown great potential in various low-level vision tasks, leading to several state-of-the-art image denoising techniques. Training a deep neural network in a supervised fashion usually requires the collection of a great number of examples and the consumption of a signiﬁcant amount of time. However, the collection of training samples is very difﬁcult for some application scenarios, such as the full-sampled data of magnetic resonance imaging and the data of satellite remote sensing imaging. In this paper, we overcome the problem of a lack of training data by using an unsupervised deep-learning-based method. Speciﬁcally, we propose a deep-learning-based method based on the deep image prior (DIP) method, which only requires a noisy image as training data, without any clean data. It infers the natural images with random inputs and the corrupted observation with the help of performing correction via a convolutional network. We improve the original DIP method as follows: Firstly, the original optimization objective function is modiﬁed by adding nonlocal regularizers, consisting of a spatial ﬁlter and a frequency domain ﬁlter, to promote the gradient sparsity of the solution. Secondly, we solve the optimization problem with the alternating direction method of multipliers (ADMM) framework, resulting in two separate optimization problems, including a symmetric U-Net training step and a plug-and-play proximal denoising step. As such, the proposed method exploits the powerful denoising ability of both deep neural networks and nonlocal regularizations. Experiments validate the effectiveness of leveraging a combination of DIP and nonlocal regularizers, and demonstrate the superior performance of the proposed method both quantitatively and visually compared with the original DIP method.


Introduction
Image denoising [1] is an image processing task with a long history, and has a wide range of application scenarios because noise contamination is inevitable in any image sensing and transmission process. It aims to recover a clean and clear image from a corrupted observation polluted by the noise of various distributions. Despite decades of development, image denoising remains a challenging task due to the need to preserve fine details while suppressing as much noise as possible.
Plenty of sophisticated denoising algorithms have been proposed to infer the original image content based on signal estimation theory. These methods can be roughly classified into two main categories, i.e., model-based methods [2,3] and learning-based methods [4,5]. Model-based methods generally establish optimization objective functions consisting of an observation model and an image prior model, which can be constructed and solved from a Bayesian perspective, using the maximum a posteriori (MAP) or the minimum mean square error (MMSE) estimators. Published works have designed a large amount of elaborate prior models. At first, researchers believed that images are generally sparse in the gradient domain and transform domain, and proposed the well-known total variation (TV) regularizer [6] and transform domain sparsity [7]; they immediately found that these regularizers could not describe the local features of images well. Hence, patch-based sparse representation models [8] then appeared to express more complex local edges and textures in a patch-wise order. However, they still ignore the relationship among patches. In order to exploit the dependence between image patches, researchers have proposed various types of structural sparsity-based models, ranging from tree-structured wavelet sparsity, blockstructured sparsity, and nonlocal sparsity [9,10]. Among them, nonlocal sparse models that explore spatial self-similarity in the image itself have shown the most benefit to image denoising. Buades et al. proposed the non-local means (NLMs) method [11] to perform denoising by averaging similar patches in the image. Dabov et al. then proposed the block-matching and 3D filtering (BM3D) method [12], which takes advantage of both the space and frequency domains. It firstly groups similar 2D image blocks into 3D data arrays, and secondly performs the 3D wavelet transform on the obtained 3D data arrays, thirdly applies the collaborative filtering on the 3D wavelet coefficients for denoising. After that, wiener filtering is utilized to carry out denoising again for the purpose of obtaining the final estimation. Besides these two classic methods, nonlocal image denoising methods also include the low-rank approach [13], which discovers the matrices grouped by 2D image blocks that have the low-rank property; the nonlocal model is based on a weighted nuclear norm constraint and its varieties [14,15], and is an extension of the low-rank approach but assigns different weights to the coefficients; and the Bayesian modeling method, i.e., the simultaneous sparse coding with Gaussian scale mixture (SSC-GSM) method [16].
In contrast to model-based methods that enforce a solution to obey some well-designed prior distributions based on statistics, learning-based methods directly learn mapping functions or sparse transform bases to estimate the missing high-frequency details from the observed noisy image or a large number of external samples. They can be divided into two categories by either learning the sparse representation or learning the deep networks. The well-known K-SVD dictionary learning algorithm [17] belongs to the first category. It trains an offline dictionary on an external large set of image patches or an online dictionary using the noisy patches in the image itself in order to have universally good representation for all test images. The nonlocally centralized sparse representation (NCSR) [18] and external patch prior guided internal clustering (EPPGIC) [19] methods have extended dictionary learning to nonlocal domains via the group sparse coding and Gaussian mixture models learning, respectively. Trainable nonlinear reaction diffusion (TNRD) [20], which tries to turn nonlinear diffusion models into a learnable deep neural network, denoising CNN (DnCNN) [21], that introduces the idea of residual learning, and a memory network (Mem-Net) [22], which designs memory blocks, can be classified into the second category. In addition, to obtain the true noise level, the noise estimation subnetwork is used in a convolutional blind denoising network (CBDnet) [23]. Recently, a popular attention mechanism has been introduced in an attention-guided network (Adnet) [24] for more accurate denoising. In order to extend the deep neural networks to solve more common image restoration problems and obtain better flexibility, combining the optimization-based methods and deep-learning-based image denoising methods has also been proposed [25,26]. Image restoration with deep CNN denoiser prior (IRCNN) [26] has been designed to integrate a set of pre-trained deep neural networks into the half-quadratic splitting framework for various types of image restoration tasks, including image denoising. Denoising a prior-driven deep neural network [27] also plugs the CNN denoiser into the half-quadratic splitting method as an image prior, which is similar to IRCNN but more adaptive, as both the CNN denoisers and the back-projection modules can be jointly optimized. Besides, an autoencoder denoiser is plugged into the objective function of image restoration in [28]. The resulting autoencoder error is then backpropagated using gradient descent.
Despite great achievements, the above deep-learning-based methods rely heavily on a large number of clean samples and a significant amount of training time. To address this issue, unsupervised deep-learning-based methods [29,30] can achieve precise recovery without clean data. While Stein's unbiased risk estimate (SURE) is used to train a deep neural network in an unsupervised fashion in [29], generative adversarial networks (GANs) trained with noisy images are adopted to yield noise-free images in [30]. Nevertheless, they still require training samples, even for noisy images. Noise2Noise [31] and deep image prior (DIP) [32] are the two most representative algorithms that only demand a small amount of training samples. In contrast to the Noise2Noise method, which demands two independent observations of the corrupted scene, the DIP method only requires the current noisy image and thus behaves more intelligently. In order to improve the performance of DIP, researchers have proposed to modify its objective function by either using SURE [33] or adding the TV sparse term [34] for more stable reconstruction. In addition, integrating DIP into optimization-based methods, such as the alternating direction method of multipliers (ADMM) framework, has been proven to be an effective mean of improvement in [35,36]. While the TV sparse term is utilized by ADMM-DIPTV [35], leading to an l 2 norm proximity operator being necessary to complete a step of ADMM, the regularization by denoising (RED) technique that turns an existing denoiser into a regularizer has been merged with DIP by DeepRED in [36]. In general, the performance of unsupervised deep-learning-based methods is worse than that of supervised methods.
In this paper, we focus on enhancing the performance of an unsupervised deeplearning-based method, i.e., the DIP method. The contributions can be summarized as follows: (1) we utilize the technique of the plug-and-play prior to explore the power of existing denoisers, i.e., NLM and BM3D for regularizing the solution of DIP; (2) we propose a novel objective function that is a linear combination of a data fitting term that enforces the output of DIP to be close to the observations and a prior term corresponding to two nonlocal sparsity-based methods, i.e., NLM and BM3D; (3) we adopt the ADMM method to separate the original complex problem into two simple subproblems via the variable splitting technique for solving the proposed objective function. Then, we can independently solve the network training problem based on the DIP process and image restoration problem via a proximal denoising operation. Thanks to the process of iterative optimization and nonlocal regularization, our method has better adaptability and flexibility, and thus gains better performance than the original DIP method.

Image Denoising Problem and Iterative Method
The image restoration includes denoising attempts to restore images that have been degraded in sensing and transmission processes. It is a common preprocessing task in computer vision, which directly affects the accuracy of subsequent analysis. Mathematically, the image restoration can be considered as a linear inverse problem: where x ∈ R N is the image to be restored, H ∈ R M×N is a degradation matrix, y ∈ R M is the measurement of x, and w ∈ R M corresponds to the measurement noise, which is usually assumed to be additive white Gaussian (AWGN) of variance σ 2 . For the problem of image denoising, H is an identity matrix. Given the degraded measurement, y, one can recover x by least-squares estimation: Considering that practical inverse problems are often ill-posed or seriously deteriorated, the regularized least-squares estimation is more favorable: where (x) is the sparse regularization term based on some prior knowledge. The regularization parameter λ pays a role of balancing the trade-off between data fidelity and prior constraining. By deriving with respect to x and making the derivative equal to 0, we have the solution of Equation (2), i.e., x = H T H(H T y), or we can solve Equation (2) with gradientdescent-based methods. H T is the transpose of H. As for Equation (3), it will take an extra step, i.e., a proximal denoising step when using gradient-descent-based methods, e.g., an iterative shrinkage-thresholding (IST) method, a fast iterative shrinkage-thresholding algorithm (FISTA), and Nesterov's algorithm (NESTA) [37]. The alternating expressions in the IST algorithm, also known as the proximal gradient method, are: where x k is the estimate of x at iteration k, and r k is the result of the gradient optimization of f (x) = y − Hx 2 2 /2. f is a smooth convex function with a Lipschitz constant, L f . Equation (4) is the gradient search step wherein ∇ f (x k ) denotes the gradient of the function, f , at the point x k , and the step size is β = 1/L f . Equation (5) is the the proximal mapping associated with function , which is defined as prox β(λ ) (r k ) := argmin We consider the simplest case, i.e., (x) = Ψx 1 , where Ψx denotes the wavelet transform of x. Then, the proximal mapping is equal to perfroming the wavelet denoising on r k with a threshold of βλ.

Deep Image Prior
The supervised deep-learning-based methods attempt to solve the image restoration problem by making use of deep neural networks that learn the mapping from degraded images to clean counterparts with a set of example pairs. The learning of deep networks is accomplished by back-propagating the error between degraded images and target images, which can be defined by a mean square error (MSE) base loss function: where y i and x i denote the i-th pair of degraded and original image patches in a large image set with the total number of G, respectively, and F Θ (y i ) denotes the reconstructed image patch by the network with parameter set Θ. We can solve Equation (6) by means of standard stochastic optimization algorithms, e.g., stochastic gradient descent (SGD) or adaptive moment estimation (ADAM) [38]. Once the network training is finished, the optimal solution for a given degraded image y is x * = F Θ (y). The success of the supervised deep-learning-based methods relies deeply on the large training set, whose total number could be tens of thousands. In some practical sensing applications, it is hard to collect a large training set, for example, in the field of medical imaging and remote sensing; therefore, we focus on unsupervised deep-learning-based methods. The deep image prior (DIP) method introduced by Ulyanov et al. in [10] has been shown to be most favorable due to the fact that it only requires the noisy observation itself without any training samples. It solves the following minimization problem: min and presents F Θ (z) as the recovered image. It is pleasantly surprised to be found that with the help of convolutional neural network (CNN) architecture, one can approximate the clean image by using a fixed random vector, z, and the degraded image, y. Thus, the deep network can be regarded as an implicit image prior, exploiting the image self-similarity to suppress noise. Since the DIP method does not have the training samples, it corresponds to an online learning method which solves Equation (7) with the ADAM method.

The Proposed Model
Although DIP has been demonstrated to be quite effective for image denoising, its results still fall short when compared to supervised deep-learning-based methods that have been shown to be state of the art. In order to improve the performance of DIP, we propose to boost it using an extra prior to perform regularization. Since the performance of nonlocal sparsity-based methods has been shown to be close to that of DIP, the explicit prior chosen by us is a nonlocal constraint that attempts to jointly explore the power of dual filtering in the spatial and frequency domains by combining nonlocal means and the BM3D denoising scheme. Mathematically, we try to solve the following constrained minimization problem: where: In Equation (9), Ω(x) and Φ(x) denote the prior constraint corresponding to nonlocal means [11] and the BM3D denoising [12] scheme, respectively. ω 1 and ω 2 are two regularization parameters. Through the use of the plug-and-play prior, we do need to offer explicit expression of Ω(x) and Φ(x), and can take them as black boxes which takes the noisy observation as an input, and the denoised image as an output. From Equation (8) and Equation (9), we can see that the combination of complementary regularizers is exploited in our restoration model, including a deep-neural-network-based prior, i.e., DIP and a nonlocal dual-filtering-based prior.

Plug-and-Play ADMM Method
The proposed model, i.e., Equations (8) and (9) cannot be solved by directly applying the DIP method, which involves computing the derivative of denoisers during the process of back-propagation. For most of the denoisers, the denoising operators do not have an explicit input-output relation, thus it is not easy to compute their derivatives. To remedy this problem, we resort to the alternating direction method of multipliers (ADMM) method [39], which is a famous variable splitting technique, and is able to separate x in the nonlocal prior term R(x) from the constraint of our minimization problem, i.e., x = F Θ (z). With the help of the augmented Lagrangian (AL), we can turn the constraint x = F Θ (z) into a penalty term, and then merge it into the objective function as follows: where µ is the penalty parameter that is a positive scalar, and the vector u is the Lagrangian multipliers associated with the constraint x = F Θ (z). Merging the last two terms, we have the scaled form of the AL: Using the method of alternating optimization, we can simultaneously learn the parameters of deep neural network, Θ, and then recover the image. Firstly, we estimate the parameters set, Θ, for fixed x and u, after which we have: This loss function is very close in spirit to the one solved in the original DIP method. It is added with the second l 2 norm term when compared to the one in the original DIP method; however, it still can be solved by means of back-propagation via applying a gradient-based method, e.g., ADAM. In particular, while the output of the network, F Θ (z), is forced to be close to the noise observation, y, in the original DIP method, our modified method makes it possible to approximate both y and the intermediate result, x − u. Hence, the loss error to be back-propagated is a linear combination of the l 2 distance between F Θ (z) and y, and the one between F Θ (z) and x − u.
Secondly, given the parameters set, Θ, and u, we need to recover the image, x. At this point, Equation (11) becomes: This is exactly the proximal denoising operation. By substituting R(x) with Equation (9) we have: min Note that the regularization parameter, λ, has been merged into ω 1 and ω 2 . The resulting composite sparse problem can easily be solved by the method of a composite splitting algorithm (CSA) [40] based on the technique of variable splitting and operator splitting. The CSA decomposes the difficult composite regularization problem, i.e., Equation (14) into two simpler constraint subproblems, and then solves each of them separately. According to the process of the CSA, we first get the following subproblems: Secondly, the solution of Equation (14) is obtained by a linear combination of x 1 and x 2 with the weights a 1 and a 2 as follows: Finally, for fixed Θ and x, the multiplier vector u is computed by: The solutions of Equations (15) and (16), x 1 and x 2 , correspond to the denoised results of NLM and BM3D denoising, respectively. For NLM, given a collection of similar patches, a nonlocal mean filter is adopted to estimate the means of these patches, as follows: where x i denotes the i-th image patch, and the weight w i is computed by: In Equation (20), x e denotes the exemplar patch and δ is the parameter representing the variance. δ = 9 in our experiments. For BM3D denoising, we only consider the initial hard thresholding step of the BM3D method, which is offered as follows: as we have observed that the secondary Wiener filtering step is of little use in our scheme. In Equation (21), X i and Γ denote the 3D data arrays and the 3D wavelet transform. Hence, the l 0 norm in Equation (21) means that the 3D wavelet coefficients should be sparse. It leads to a hard thresholding step. In practice, the regularization parameters ω 1 and ω 2 , which have been proven to be closely connected with the thresholds of denoising, do not need to be set. Instead of setting ω 1 and ω 2 , we turn to determine their thresholds. While the NLM method does not have a threshold, the threshold of BM3D denoising is determined by the variance of noise, σ 2 , which can be obtained by maximum likelihood estimation: In general, the proposed algorithm is summarized in Algorithm 1, named "Nonlocal Regularized Deep Image Prior (NR-DIP)". Exploring the power of complementary priors and ADMM iterations is implemented in our method for image denoising. The flow chart and the symmetric CNN architecture of our method is represented in Figure 1, from which it can be seen that our network architecture is consistent with the one in [32], which is based on the classical U-Net structure, adopting the model of encoder-decoder. While the "Convolution + Down Sample + Batch Normalization + Leaky ReLU + Convolution + Batch Normalization + Leaky ReLU" blocks are carried out for feature extraction in the encoder units, the "Batch Normalization + Convolution + Batch Normalization + Leaky ReLU + Convolution + Batch Normalization + Leaky ReLU + Up Sample" blocks are performed for image restoration in the decoder units. Skip connections are added for capturing image structures of different characteristic scales. The parameters of the network are listed in Table 1. From Figure 1 and the pseudo code of the proposed algorithm, we can see that our method is based on the ADMM method. In each ADMM iteration, we firstly perform the network learning via the ADAM method to obtain the network parameters, Θ, and we then estimate the noise standard deviation, σ. Finally, we recover the image, x, by using the network output, F Θ (z), and the multiplier, u, of the previous iteration and then update the multiplier.
denoising on F Θ k+1 (z) + u k and set the threshold according to σ 2 . x a x a x + = + .

Experiments
In order to verify the excellent performance of the proposed NR-DIP for image denoising, we compare our method with nine image restoration algorithms, including two nonlocal sparsity-based methods, i.e., NLM [11] and CBM3D [12] ("C" denotes color image), two supervised deep-learning-based methods, i.e., FFDNet [41] and IRCNN [26], which have been shown to be superior to the well-known benchmark method, i.e., DnCNN [21], and an unsupervised deep-learning-based method, i.e., DIP [32]. Among these comparison methods, NLM, CBM3D, and DIP are three methods that are the foundation of our method. Through the comparisons between NLM, CBM3D, DIP, and NR-DIP, one can validate the benefit of complementary priors. CBM3D can be considered as the most efficient nonlocal sparsity-based method. While FFDNet represents a class of deep-learning-based methods which train direct mapping from degraded images to clean images, IRCNN represents the ones that merge deep neural networks into optimizationbased methods. In fact, we also considered including the ADMM-DIPTV [35] method into our comparisons, but we found that it suffers from severe performance degradation in the later stage of iteration. Both ADMM-DIPTV and our method are based on ADMM and DIP; however, ADMM-DIPTV only combines the TV constraint into the framework of DIP due to its convexity. Introducing the use of the plug-and-play prior scheme makes the adoption of existing powerful denoising algorithms available and flexible, greatly extending the recovery ability of DIP. Figure 2 provides eight test natural images used in our experiments with various sizes. We generated noisy measurements by adding varying amounts of additive white Gaussian noise to the test images. The range of the standard deviation of Gaussian noise includes 25, 30, 35, 40, 50, 60, and 75. Then, we applied the comparison methods to execute

Experiments
In order to verify the excellent performance of the proposed NR-DIP for image denoising, we compare our method with nine image restoration algorithms, including two nonlocal sparsity-based methods, i.e., NLM [11] and CBM3D [12] ("C" denotes color image), two supervised deep-learning-based methods, i.e., FFDNet [41] and IRCNN [26], which have been shown to be superior to the well-known benchmark method, i.e., DnCNN [21], and an unsupervised deep-learning-based method, i.e., DIP [32]. Among these comparison methods, NLM, CBM3D, and DIP are three methods that are the foundation of our method. Through the comparisons between NLM, CBM3D, DIP, and NR-DIP, one can validate the benefit of complementary priors. CBM3D can be considered as the most efficient nonlocal sparsity-based method. While FFDNet represents a class of deep-learning-based methods which train direct mapping from degraded images to clean images, IRCNN represents the ones that merge deep neural networks into optimization-based methods. In fact, we also considered including the ADMM-DIPTV [35] method into our comparisons, but we found that it suffers from severe performance degradation in the later stage of iteration. Both ADMM-DIPTV and our method are based on ADMM and DIP; however, ADMM-DIPTV only combines the TV constraint into the framework of DIP due to its convexity. Introducing the use of the plug-and-play prior scheme makes the adoption of existing powerful denoising algorithms available and flexible, greatly extending the recovery ability of DIP. Figure 2 provides eight test natural images used in our experiments with various sizes. We generated noisy measurements by adding varying amounts of additive white Gaussian noise to the test images. The range of the standard deviation of Gaussian noise includes 25, 30, 35, 40, 50, 60, and 75. Then, we applied the comparison methods to execute denoising. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [42,43], shown in Equation (23) and Equation (24): are used to quantitatively evaluate the qualities of the reconstruction results. In Equation (23), m and n denote the image size. In Equation (24), µ x , µ y , σ 2 x , σ 2 y , and σ xy are the average intensities, standard deviations, and cross-covariance of a clean image, y, and an evaluated image, x, respectively. For fair comparisons, we have downloaded codes from the websites of the authors, adopting the default experiment settings. The main parameters of the proposed NR-DIP algorithm include: (1) the weights of the linear combination, i.e., a 1 = 0.4 and a 2 = 0.6 in Equation (16) (3) the learning rate of training is set to 0.008. Due to the long running time of BM3D, we apply it once every 100 iterations of the ADMM to save time. The DIP-based methods were implemented in the Python language with a PyTorch framework and run on an NVIDIA GTX 3090 GPU. NLM and CBM3D were implemented in the Python language, but were run without a GPU. FFDNet and IRCNN were implemented in the MATLAB language and also run without GPU because they are already fast. The experimental results including objective quality, subjective quality, and runtime are present.  (14); and (3) the learning rate of training is set to 0.008. Due to the long running time of BM3D, we apply it once every 100 iterations of the ADMM to save time. The DIP-based methods were implemented in the Python language with a PyTorch framework and run on an NVIDIA GTX 3090 GPU. NLM and CBM3D were implemented in the Python language, but were run without a GPU. FFDNet and IRCNN were implemented in the MATLAB language and also run without GPU because they are already fast. The experimental results including objective quality, subjective quality, and runtime are present. We first present the experimental results in terms of objective quality. While the PSNR results of the comparison methods on the set of test images shown in Figure 2 are offered in Table 2, the corresponding SSIM results are provided in Table 3. Since FFDNet and IRCNN do not provide the learned models with a noise standard deviation of 60 and 75, we have not given their corresponding comparison results. It can be seen in these tables that the supervised deep-learning-based methods, i.e., FFDNet and IRCNN, outperform others regardless of a high or low noise level. In both of them, the FFDNet algorithm performs slightly better. Apart from them, the proposed NR-DIP method has achieved highly competitive denoising performance compared to other leading algorithms, consisting of the unsupervised deep-learning-based methods, i.e., DIP and the nonlocal sparsitybased methods, i.e., NLM and CBM3D. The proposed NR-DIP method outperforms the original DIP by up to 0.5 dB on average, which has verified the effectiveness of our modifications. By employing NLM and BM3D to regularize the solution of DIP, the NR-DIP method can combine the power of nonlocal denoising and deep-learning-based denoising. It leads to a better solution, which is enforced by complementary priors and is optimized with the ADMM method when compared to the original DIP. Among the nonlocal sparsity-based methods, the CBM3D method is obviously better than the NLM. On average, NR-DIP falls behind FFDNet by less than 1.1 dB. (1) Compared to the supervised deep- We first present the experimental results in terms of objective quality. While the PSNR results of the comparison methods on the set of test images shown in Figure 2 are offered in Table 2, the corresponding SSIM results are provided in Table 3. Since FFDNet and IRCNN do not provide the learned models with a noise standard deviation of 60 and 75, we have not given their corresponding comparison results. It can be seen in these tables that the supervised deep-learning-based methods, i.e., FFDNet and IRCNN, outperform others regardless of a high or low noise level. In both of them, the FFDNet algorithm performs slightly better. Apart from them, the proposed NR-DIP method has achieved highly competitive denoising performance compared to other leading algorithms, consisting of the unsupervised deep-learning-based methods, i.e., DIP and the nonlocal sparsity-based methods, i.e., NLM and CBM3D. The proposed NR-DIP method outperforms the original DIP by up to 0.5 dB on average, which has verified the effectiveness of our modifications. By employing NLM and BM3D to regularize the solution of DIP, the NR-DIP method can combine the power of nonlocal denoising and deep-learning-based denoising. It leads to a better solution, which is enforced by complementary priors and is optimized with the ADMM method when compared to the original DIP. Among the nonlocal sparsity-based methods, the CBM3D method is obviously better than the NLM. On average, NR-DIP falls behind FFDNet by less than 1.1 dB. (1) Compared to the supervised deep-learning-based methods FFDNet and IRCNN, our method is generally worse. It is reasonable that FFDNet and IRCNN require thousands of samples to train the network, acquiring a good generalization ability; however, our method solely puts the noisy observation into the network without requiring any clean images, and thus is prone to over-fitting. Nevertheless, our method is quite useful for some applications in the case of a lack of sample data. (2) Compared to the nonlocal sparsity-based methods NLM and CBM3D, our method usually performs better due to the joint use of NLM, CBM3D, and DIP. (3) Compared to the original DIP method, our method usually performs better. This is not only because of combining the power of nonlocal denoising and deep-learning-based denoising, but also because it benefits from the ADMM iteration that leads to a better solution by alternating optimization. However, using NLM and CBM3D sometimes slightly reduces the performance of the DIP method for the image with a complex texture, e.g., the Baboon image, and the lower-power noise environment. While the output of the network, F Θ (z), is forced to be close to the noise observation, y, in the original DIP method, our modified method makes it approximate both y and the intermediate result, x − u. Therefore, when y and x − u contain similar information, our algorithm does not improve by much. With regard to SSIM, as shown in Table 3, the results of the original DIP are worse than that of CBM3D, which is inconsistent with the PSNR results. However, our method does not follow the original DIP to get worse in terms of SSIM. It is demonstrated that the nonlocal regularized scheme helps to maintain structural information while removing noise.
In order to show the comparison results more vividly, Figure 3 provides the average PSNR and SSIM values of the comparisons between the reconstructions by different methods. The order of performance in terms of PSNR from good to bad is FFDNet, IRCNN, NR-DIP, DIP, CBM3D, and NLM, while the in terms of SSIM it is FFDNet, IRCNN, NR-DIP, CBM3D, DIP, and NLM. The average PSNR results of NLM, CBM3D, FFDNet, IRCNN, DIP, NR-DIP are 27.4 dB, 28.9 dB, 30.7 dB, 30.5 dB, 29.1 dB, and 29.6 dB for the noise standard deviation ranging from 25 to 50, respectively, while the average PSNR results of NLM, CBM3D, DIP, and NR-DIP are 23.1 dB, 24.9 dB, 24.0 dB, and 25.3 dB for the noise standard deviation of 60 and 75, respectively. For a higher noise level, the difference between algorithms increases, but the order remains unchanged. Note that the NR-DIP method is better than the original DIP, NLM, and CBM3D methods, and has verified the effectiveness of the proposed combination scheme. Although the results of NR-DIP still fall short when compared to the supervised state-of-the-art methods, NR-DIP has been shown to be quite effective, and demonstrated successfully the problem of image denoising. For a higher noise level, the difference between algorithms increases, but the order remains unchanged. Note that the NR-DIP method is better than the original DIP, NLM, and CBM3D methods, and has verified the effectiveness of the proposed combination scheme. Although the results of NR-DIP still fall short when compared to the supervised state-of-the-art methods, NR-DIP has been shown to be quite effective, and demonstrated successfully the problem of image denoising.

Subjective Quality Evaluation
Visual comparisons between the restoration results by competing methods are provided in Figures 4-7 with the standard deviation of noise, i.e., sigma = 25, 35, 50, and 75 representing the performance under various kinds of noise environments. From these figures, one can clearly see that the best denoising results are still achieved by a supervised deep-learning-based method, i.e., FFDNet. Meanwhile, IRCNN performs comparably with FFDNet, following closely. For the rest of the algorithms that do not rely on large amounts of training data, the proposed algorithm NR-DIP performs better than others, and enjoys great advantages in producing clearer images, e.g., the edges and fine textures. While the NLM method over-smoothed and provided a lack of image details, meaning that its denoising ability is relatively poor, the CBM3D method results in ringing artifacts due to the truncation operation in the frequency domain. In contrast, the restoration results by DIP suffers from noticeable artifacts, especially for a higher noise level when compared to the one by NR-DIP, which delivers excellent image contrast and clear details due to its capability of achieving a better spatial adaptation by using complementary priors. For a higher noise level, the denoised images of all competing algorithms are seriously degraded, but the proposed NR-DIP method still shows its advantages in recovering more a detailed image when compared to the competing ones, except for FFDNet and IRCNN. These results verify again that the combination of CBM3D, NLM, and DIP is reasonable.

Runtime Comparison
In order to evaluate the running time and convergence of the proposed method, we also traced the runtime and PSNR of each iteration for the original DIP method as well as our method. The results are illustrated in Figure 8, which presents the iteration number vs. PSNR curves of the House and Butterfly images, respectively. In particular, the performance curves in Figure 8 correspond to the experiment in Figure 5 and Figure 6. In Figure 8, the NR-DIP method achieves the better performance in terms of PSNR and iterations than the original DIP after around 1000 iterations. These curves validate that the proposed method NR-DIP can converge to a good denoised result in a reasonable number of iterations.

Runtime Comparison
In order to evaluate the running time and convergence of the proposed method, we also traced the runtime and PSNR of each iteration for the original DIP method as well as our method. The results are illustrated in Figure 8, which presents the iteration number vs. PSNR curves of the House and Butterfly images, respectively. In particular, the performance curves in Figure 8 correspond to the experiment in Figure 5 and Figure 6. In Figure 8, the NR-DIP method achieves the better performance in terms of PSNR and iterations than the original DIP after around 1000 iterations. These curves validate that the proposed method NR-DIP can converge to a good denoised result in a reasonable number of iterations.

Runtime Comparison
In order to evaluate the running time and convergence of the proposed method, we also traced the runtime and PSNR of each iteration for the original DIP method as well as our method. The results are illustrated in Figure 8, which presents the iteration number vs. PSNR curves of the House and Butterfly images, respectively. In particular, the performance curves in Figure 8 correspond to the experiment in Figures 5 and 6. In Figure 8, the NR-DIP method achieves the better performance in terms of PSNR and iterations than the original DIP after around 1000 iterations. These curves validate that the proposed method NR-DIP can converge to a good denoised result in a reasonable number of iterations.
The average runtime to recover an image with a size of 512 × 512 by FFDNet, IRCNN, NLM, CBM3D, DIP and NR-DIP, are about 1.23 s, 2.26 s, 2.68 s, 24.80 s, 148.09 s and 244.16 s, respectively. We list the average runtime in Table 4. The supervised deep learning methods, FFDNet and IRCNN, are fast and take only several seconds to remove noise for an image, but they take a lot of time to train deep neural networks. The unsupervised deep learning methods use an online learning scheme, resulting in a slow speed. Merging nonlocal regularization into the framework of DIP in our method definitely increases the running time, but within a reasonable range.  Table 4. The supervised deep learning methods, FFDNet and IRCNN, are fast and take only several seconds to remove noise for an image, but they take a lot of time to train deep neural networks. The unsupervised deep learning methods use an online learning scheme, resulting in a slow speed. Merging nonlocal regularization into the framework of DIP in our method definitely increases the running time, but within a reasonable range.

Conclusions
We have proposed an effective iterative algorithm equipped with the deep image prior and the plug-and-play nonlocal priors for image denoising. Our work contributes the following: First, guided by information theory, the use of complementary constraints was introduced to construct the nonlocal regularized deep image prior model for an unsupervised deep-learning-based image denoising problem. It can substantially enhance the performance of the original DIP by jointly utilizing the nonlocal information in the spatial and frequency domains. Second, an effective ADMM-based algorithm with excellent denoising ability is proposed in this paper to make the proposed model much easier to be solved by using the technique of variable splitting. Finally, our experiments on several natural images demonstrate the superiority of the proposed algorithm to two nonlocal sparsity-based denoising algorithms and the original DIP method. While this work has designed a flexible combination of the unsupervised deep-learning-based method DIP and the plug-and-play priors in the framework of ADMM, in addition to demonstrating promising performance, we plan to combine the DIP with other supervised deep-learning-based methods to boost denoising performance in future work.

Conclusions
We have proposed an effective iterative algorithm equipped with the deep image prior and the plug-and-play nonlocal priors for image denoising. Our work contributes the following: First, guided by information theory, the use of complementary constraints was introduced to construct the nonlocal regularized deep image prior model for an unsupervised deep-learning-based image denoising problem. It can substantially enhance the performance of the original DIP by jointly utilizing the nonlocal information in the spatial and frequency domains. Second, an effective ADMM-based algorithm with excellent denoising ability is proposed in this paper to make the proposed model much easier to be solved by using the technique of variable splitting. Finally, our experiments on several natural images demonstrate the superiority of the proposed algorithm to two nonlocal sparsity-based denoising algorithms and the original DIP method. While this work has designed a flexible combination of the unsupervised deep-learning-based method DIP and the plug-and-play priors in the framework of ADMM, in addition to demonstrating promising performance, we plan to combine the DIP with other supervised deep-learningbased methods to boost denoising performance in future work.