Image Denoising via Improved Dictionary Learning with Global Structure and Local Similarity Preservations

: We proposed a new efﬁcient image denoising scheme, which leads to four important contributions. The ﬁrst is to integrate both reconstruction and learning based approaches into a single model so that we are able to beneﬁt advantages from both approaches simultaneously. The second is to handle both multiplicative and additive noise removal problems. The third is that the proposed approach introduces a sparse term to reduce non-Gaussian outliers from multiplicative noise and uses a Laplacian Schatten norm to capture the global structure information. In addition, the image is represented by preserving the intrinsic local similarity via a sparse coding method, which allows our model to incorporate both global and local information from the image. Finally, we propose a new method that combines Method of Optimal Directions (MOD) with Approximate K-SVD (AK-SVD) for dictionary learning. Extensive experimental results show that the proposed scheme is competitive against some of the state-of-the-art denoising algorithms.


Introduction
While images are widely used in various fields, they are usually contaminated by noise during acquisition, transmission and compression.Consequently, real-life images are often degraded with noise and there is often a need for image denoising techniques.Image denoising is known to be ill-posed problem in image processing and computer vision.Theoretically, it is hard to guarantee the recovery of a distored image since image denoising is a highly under-constrained problem.For instance, medical images are usually affected by a combination of impulsive, additive or multiplicative noise [1] and it is hard to identify the type and model the noise in real world problems [2].Images with high resolutions are desirable in many applications, e.g., object recognition [3], face clustering [4,5], and image segmentation in medical and biological science [6].Hence, denoising is a critical step for improving the visual quality of images [7].Denoising methods developed so far have focused one of the two forms of noise, additive and multiplicative.Though a plethora of noise removal techniques have appeared in recent years, image denoising for real-life noise still remains an important challenge [8].
A number of denoising techniques have been developed to address this problem.For example, pixel level filtering methods and patch based filtering methods such as Gaussian filtering, total variation (TV) [9], non-local means (NLM) [7], block-matching 3D filtering (BM3D) [10], and low-rank regularization [11] have provided improved image quality with image details well recovered.Among them, the classic TV method makes use of Laplacian or hyper-Laplacian models for image filtering, where they assume that natural image gradients usually exhibit heavy-tailed distributions [12][13][14].For instance, the Hessian-Schatten approach has been proposed in [15], which maintains the advantages of TV while eliminating the staircase effect by not penalizing first-order derivatives.The patch-based filtering methods group similar image patches together and then recover their common structures.For instance, BM3D usually requires expensive pair-wise patch comparisons.Its basic idea is to get a sparse representation in the transformed domain.It first groups similar 2D patches of the image into 3D data arrays.A highly sparse representation is obtained through 3D transformation and shrinkage.Through this procedure, the finest details shared by grouped patches are captured while the essential, unique features of each individual patch are preserved.This algorithm obtains outstanding denoising performance; however, it requires many implementation tricks [16].Though observed effective for slightly noisy image, the performance of the above-mentioned methods is far from satisfying by the over-smooth effect, due to the reason of significantly degraded accuracy in patch matching.Besides such single-image based methods, learning based methods have been developed by integrating natural image priors, such as neural network training [16], maximizing expected patch log-likehood (EPLL) [17], and fields of experts [18].
Sparse and redundant representation modeling [19] has recently received extensive research attention and found quite successful applications in signal and image processing.The most common framework for image denoising is formulated with an energy to minimize the following [20]: where ψ : R N → R is a regularization function, and the quadratic "data-fitting" term ensures that the estimated x is close to the noisy observation y.In general, it is difficult to find a good regularization function ψ, and, in fact, it is probably one of the most important research topics in image processing nowadays [21].Sparse signal representation has been shown successful [20].It describes that a signal can be approximated as a linear combination of as few as possible atoms from a given dictionary.More precisely, a target signal y ∈ R N can be described as y ≈ Φω, where Φ ∈ R N×M is an overcomplete dictionary if M > N and ω is a vector containing the representation coefficient of y.We are interested in seeking the sparsest solution ω, i.e., the one with the fewest nonzero entries.The solution can be obtained by solving the following problem: Here, a typical choice for the regularization term ψ(ω) might be the L 0 -norm of ω that counts the number of nonzero elements of ω.Exact determination of sparsest representations is known to be an non-deterministic polynomial (NP)-hard problem.Thus, a number of algorithms have been proposed to provide the sparsest approximation of a signal, including Orthogonal Matching Pursuit (OMP) and Basis Pursuit (BP) [21].BP relaxes the L 0 -penalty by replacing it with L 1 -penalty [21].Dictionary design employed for sparse decomposition of a signal is also an important problem.Basically, the dictionaries can be classified into two categories: non-adaptive dictionaries and adaptive dictionaries [21].
Although current methods have been shown successful, they are often designed for a specific type of noise removel problem.Unfortunately, it is usually hard to have a perfect knowledge of the noise in real world problems.To address this problem, in this paper, we propose a new image denoising method that is capable of removing both additive and multiplicative noise.Moreover, our new method integrates both learning-based and recosntruction-based parts, allowing them to mutually enhance along with the optimization procedure, leading to a powerful denoising capability.For optimization, we use a two-stage optimization strategy, which divides our objective function into convex and non-convex parts.Specifically, for the non-convex part, we embark from the framework of K-SVD denoising and improved it based on [22]; for the convex part, we use an alternating optimization and gradient descent method similar to the one used in [23].
We summarize the contributions of this paper as follows: • We developed an image denoising approach that processes advantages of both reconstruction-based and learning-based methods.A practical two-stage optimization solution is proposed for the implementation.

•
We introduced a sparse term to reduce the multiplicative noise approximately to additive noise.Consequently, our method is capable of removing both additive and multiplicative noise from a noisy image.

•
We used the Laplacian Schatten norm to capture the edge information and preserve small details that may be potentially ignored by learning based methods.Hence, both global and local information can be preserved in our model for image denoising.

•
We established a new method that combines Method of Optimal Directions (MOD) with Approximate K-SVD (AK-SVD) for dictionary learning.
The rest of the paper is organized as follows.In Section 2, we introduce our proposed method and develop a two-stage optimization procedure.We conduct extensive experiments and show the results in Section 3 to verify the effectiveness of the proposed method.Finally, we conclude our work in Section 4.

Proposed Method
In this section, we discuss the proposed method.First, we present the formulation of the proposed model.Then, we develop a practical yet effective optimization strategy for the proposed method.

Formulation
A visually meaningful image usually contains global structures such as edges, contours, textures and smooth regions.These structures constitute the visual contents and can be captured with an aid of a high-pass filter.At the same time, local image patches usually have high self similarities.Any image patch could be sparsely represented as a linear combination of the others.The local similarity is revealed via a learned dictionary.In the global structure part, it contains a fidelity term, a low rank term, and a sparse term while, in the local similarity part, it contains a patch based term and constraint.We use one formulation, which consists of two parts: one part is designed for reconstruction of global structure while the other one for preservation of local similarity.

1.
Global Structure Reconstruction: High pass filter emphasizes fine details of an image by effectively enhancing contents that are of high intensity gradient in the image.After high pass filtering, clean image contains the high frequency contents that represent global structures while low frequency contents are eliminated, making the filtered image of low rank.However, since noise usually has high-frequency components too, it may still remain together with the structural information after high-pass filtering.For each pixel, noise usually does not depend on neighboring pixels while the pixels on the global structure such as edges and textures have correlations with their neighbouring pixels.To differentiate noise and structural pixels, we consider minimizing the rank of high-pass filtered image.As Schatten norm can effectively approximate the rank [24], we use the Schatten norm of high-pass filtered image to capture the underlying structures.
Let X ∈ R n 1 ×n 2 be a matrix with singular value decomoposition (SVD) X = UΣV T where U ∈ R n 1 ×n 1 and V ∈ R n 2 ×n 2 are unitary matrices consisting of singular vectors of X, and Σ ∈ R n 1 ×n 2 is a rectangular diagonal matrix consisting of singular values of X.Then, the Schatten p-norm (S p norm) of X is defined as where p ≥ 1 is the order of Schatten norm and σ k is the kth singular value of X.The family of Schatten norms include three common matrix norms, including the nuclear norm (p = 1), the Frobenius norm (p = 2) and the spectral norm (p = ∞).
In this paper, to high-pass filter the image, we adopt an 8-neighborhoods Laplacian operator defined as This Laplacian filter captures 8-directional connectedness of each pixel and thus the structures of the image as well.By filtering the image with such Laplacian filter, we can obtain a low-rank filtered image LX containing the global structures of the image.Hence, it is desireable to minimize the rank of LX to ensure the low-rankness of the global structures.To achieve this goal, we propose to adopt the above defined Hessian Schatten-p norm as rank approximation, and by minimizing LS p = LX S p , the global structures of the image can be well preserved.
Because multiplicative noise is image content dependent, it may remain mixed with the clean image after minimizing Laplacian Schatten norm of the noisy image.To alleviate the effect of multiplicative noise, we introduce a sparse matrix S that may as well capture the outliers in the case of additive noise.In a now-standard way, we minimize the 1-norm of S to obtain the sparsity.In summary, our model is as follows: where Y is the noisy image, X is the clean image, S denotes a matrix containing globally sparse noise, and E is the remaining noise matrix.For convenience of optimization, we use Frobenius norm as a loss function to measure the strength of E. Combining them together, we formulate an objective function to preserve the global structure as follows: where • 1 represents the matrix l 1 norm, and λ 1 , λ 2 are balancing parameters.

2.
Local Similarity Preservation: We define the local similarity of an image using its patches with a size of √ n × √ n pixels.We define an operator R i that extracts the ith patch from X and orders it as a column vector, i.e., x i = R i X ∈ R n×1 .To preserve the local similarity of image patches we exploit the dictionary learning.Define a dictionary Φ ∈ R n×k , where k > n is the number of dictionary basis.Each column of Φ is a basis, i.e., Φ = [ϕ 1 , . . ., ϕ n ] and the dictionary is redundant.The local similarity suggests that every patch x in the clean image may be sparsely represented over this dictionary.The sparse representation vector is obtained by solving the following constrained minimization problem: or alternatively by MAP estimator where ω * is the sparse representation vector of patch ϕ, • 0 represents the l 0 norm and and T are parameters that control the error of the sparse coding and the sparsity of representation.
For reconstruction of global structures and preservation of local similarities, the unified image denoising needs to solve the following optimization problem: It is seen that the above model has incorporated both global and local information in the first three terms and the last term, respectively, to recover the original image.We will develop an effective optimization scheme in the remainder of this section.

Practical Solution for Optimization
In the following, for the ease of notation, we define Ω to be the matrix of which each column is ω It is noted that the first three terms of Equation ( 7) are pixel based while the last and the constraint are patch based.Due to this fact, it is difficult to directly optimize the overall objective function.Inspired by [24], we use a two-stage approach to find a local optimal solution in which we do optimization over pixel and patch based terms separately.
We decompose the objective function into two parts, each of which contains only pixel-wise operation or patch-wise operation: and Basically, the updating rules of the two-stage strategy is given by alternatively applying Equations ( 10) and ( 11) It is important to notice that the terms µ X − X t F are critical because they represent the connection between the two stages.For simpler notation, we define two modified functions:

Global Structure Reconstruction Stage
For the first-stage optimization, we use an alternating strategy to optimize the function with respect to X and S by fixing one and updating another.For the initialization of X 0 2 , there are two choices: (1) for the very first iteration, we only do optimization over function G instead of Ḡ and later do optimization over the modified one; and (2) we initialize X 0 2 = Y.The second choice is potentially more computationally expensive because it forces X 0 2 to be the noisy image Y.In our experiments, we use the first approach.Since at each alternating step the objective function is convex, we make use of gradient descent method for the optimization.By the fact that Z p S p = Tr Z T Z p/2 , the Laplacian Schatten norm term can be reformulated as When p = 1, the Schatten norm is non-smooth.In addition, the sparse term is non-smooth.For the two non-smooth terms, we use two different approaches to obtain the gradient.On one hand, we introduce a small smoothing parameter δ in the above equation to get a smoothed approximation: where I ∈ R N 2 ×N 2 is the identity matrix and δ.Thus, G could be reformulated by replacing the Schatten norm with a smoothed trace norm in Equation ( 8): Using the alternating optimization strategy, we get the updating rule of X and S in the first stage as follows: where t denotes the iteration of the outer optimization, s represents the iteration of the inner alternating optimization in the first stage, and X (s) t and S (s) t denote the values of X and S at the tth outer and sth inner iteration, respectively.
Notice that Ḡ(X, S) is not differentiable with respect to S at zeros.On the other hand, we adopt a sub-gradient when taking a derivative with respect to S, i.e., where ∂ S S 1 is the sub-differential matrix defined as: The Laplacian Schatten norm term has been smoothed by using δ, so it is straightforward to take the derivative of G with respect to X: Now, using gradient descent method, S and X are updated alternatively until convergence, i.e., S by the follows: where r denotes the iteration of gradient descent optimization.
The proof of the proposition is in the Appendix A.

Dictionary Learning Stage
In the second stage, we optimize the following function: There are three variables in this objective function: the underlying clean image X , the sparse coefficient matrix Ω and the underlying dictionary Φ.The underlying dictionary Φ is initialized with a 2D separable Discrete Cosine Transform(DCT) dictionary of size L × K. First, we produce a 1D-DCT matrix Φ 1D of size √ L × √ K.Each atom of the matrix Φ 1D can be obtained by Then, we use a Kronecker-product Φ = Φ 1D ⊗ Φ 1D to initialize the dictionary Φ.
We adopt also the alternating optimization strategy for this stage.First, by fixing Φ and X, we update Ω.Then, by fixing Ω and X, we update Φ.Finally, by fixing Φ and Ω, we update X.We repeat these three steps a given number of times.In the first step, we aim to find optimal ω i by solving: This problem is equivalent to the following with a proper value of the penalty parameter τ: The main task of this stage is thus to solve a set of l 0 minimization problems.The exact solution of l 0 minimization is very difficult and has been proven to be NP hard.Because of this fact, matching pursuit algorithms such as basis pursuit (BP) [25], matching pursuit (MP) [26], orthogonal matching pursuit (OMP) [27], and the focal underdetermined system solver (FOCUSS) [28] are widely considered to obtain the approximate solutions of sparse representation [21].MP and OMP greedily select the dictionary atoms sequentially.BP suggests an approximation of the sparse representation by replacing l 0 -norm with l 1 -norm.FOCUSS is similar to BP, which replaces l 0 -norm by l p -norm with 0 < p < 1 instead of l 1 -norm.This generalization measure approximates the true sparsity better when p < 1, but the overall problem becomes nonconvex.However, convergence is not always guaranteed using the above methods.Besides the matching pursuit methods, the message passing algorithm (MPA) [19] is able to directly solve l 0 minimization problem.MPA is designed to solve l p problem with p ≥ 0. When the problem is convex, i.e., p ≥ 1, MPA gives global optimum and when the problem is nonconvex, i.e., 0 ≤ p < 1, MPA finds a local minimum.Besides K-SVD, the idea of obtaining a sparse representation for a set of training image patches by learning a dictionary has been studied in a series of works during the recent years.Although we have convergence guarantee from MPA, in this paper, we adopt an OMP method as in [22] because MPA takes more time than OMP, which is effective enough in our problem.After the sparse representation matrix Ω is fixed, we adopt AK-SVD algorithm in [22] to update the dictionary Φ column by column.The AK-SVD proposes an iterative algorithm that handle the task effectively.Finally, given the coefficient matrix Ω, we then update X by solving: For this quadratic function, there is a close-form solution, which can be obtained by setting its first-order derivative to zero: leading to 1 Hence, it is clear to see the closed-from solution of X: This expression implies that the clean image is obtained by averaging the denoised patches with some relaxation by averaging the patches of X t 1 , regarded as an original noised image input in the learning based stage.By introducing the additional term in Equation ( 13), we can directly use AK-SVD in [22] to solve Equation (21).It is flexible and can work well with OMP.Now, following [22], we use X as an initial value to update the dictionary, which is to solve MOD is an appealing dictionary training algorithm [21].A significant advantage of this method is its simple way for updating the dictionary.The MOD algorithm involves two stages described above.Assume that we fix Φ and aim to find the representations coefficient vectors ω i to build the matrix Ω by using OMP.We define the errors e i = x i − Φω i and evaluate the overall representation mean square error using a Frobenius norm, which is given by Once the sparse coding task is done, we fix X and search for an update to Φ to minimize the above error, which is (using pseudo-inverse) The K-SVD algorithm [21,29] takes a different update rule for the dictionary, in which the atoms in Φ are updated sequentially.Moreover, the K-SVD updates each atom along with the coefficients in Φ that multiply it using singular value decomposition (SVD) [30].As described above, this problem leads to a matrix rank-l approximation [30] given by min where ϕ j denotes the j-th atom in Φ, ω j denotes the j-th coefficients row in Ω, E j 0 = X − ∑ j =j 0 ϕ j ω j is a known pre-computed error matrix without the j 0 -th atom.ϕ j 0 is the updated atom, and ω j 0 is the new coefficients row in Φ.The optimal solution can be directly obtained via performing an SVD operation.
In practice, it is difficult to obtain the exact solution of Label (31), as performing SVD for atom updating leads to its computational burden, especially in high dimensions.Therefore, Rubinstein [31] proposed a new algorithm to provide an approximate solution rather than the exact one.The resulting algorithm is known as the Approximate K-SVD (AK-SVD).The AK-SVD perform a single iteration of alternate optimization over the atom ϕ j 0 and the coeffcients row ω j 0 , which is given by This process is simple and also quite intuitive.It is important that this process not only finally converges to the optimum, but also provides an approximate solution, which effectively minimizes the error as defined in Label (31).The main contribution of the AK-SVD method is that it avoids the use of the SVD to find alternative d j 0 and ω j 0 .
Smith [32] puts forward an idea that applying multiple dictionary update cycles via the MOD or K-SVD approach can effectively minimize the representation error.In this paper, following [22], we derive a new method for dictionary learning based on multiple dictionary update cycles.We call this method as MOD-AK-SVD to distinguish it from the above reported algorithms.
Our objective is to find an update of Φ and X such that the supports in Φ remain intact.To achieve this, the dictionary update stage is divided into two optimization process.Minimizing ∑ 2 over Φ with a fixed X and getting the results of formula (30).Next, we minimize 2 over Φ and X keeping the support in Φ intact.By defining t j = {i : ω j(i) = 0}, ω j(t j ) denotes the non-zeros coefficients in ω j.Our problem becomes min ϕ j 0 ,ω j 0 Applying alternating minimization, formula (33) leads to the following solutions: At the first stage, we update ϕ j 0 with a fixed ω j 0 (t j 0 ), and, at the second stage, we allow only the existing non-zeros coefficients ω j 0 (t j 0 ) to update using the previously updated ϕ j 0 .
Performing a few alternations between Label (30) and Label (34) can better approximate the overall solution of Label (21).
The detailed parameter setting is described in next Section.In summary, the above practical solution is listed in Algorithm 1.We name our algorithm the Laplacian Schatten p-norm and Learning Algorithm (LSLA-p).In our work, we consider the cases when p = 1 and p = 2, namely, LSLA-1 and LSLA-2.The empirical value of parameters in Algorithm 1 was shown in Table 1.

Experiments
In this section, we conduct extensive experiments to verify the effectiveness of the proposed method.Particularly, we present the parameter settings in the first subsection and discuss the experimental results in the second subsection.

Parameter Setting
We tune the parameters in two parts: the reconstruction based part and the learning based part, which are described independently.
For the reconstruction part, usually λ 1 and λ 2 are selected from a set of values with λ 1 ∈ {2, 3, 5, 8, 10, 15, 20, 25, 30} and λ 2 ∈ {0.01, 0.02, 0.05, 0.08, 0.1}.Large values such as {100, 150, 200, 250, 300, 400} for λ 1 and {0.5, 1, 2, 3, 5} for λ 2 are also used for a small number of images.As a common stragety for unsupervised learning methods [4], in the experiments, we use all combinations of parameters from the above sets and report the best performance.As will be clearer in the later section, our method has comparable performance with a broad range of parameters values.Parameter µ appears in both of the two stages and the setting will be mentioned later.The smoothing parameter δ is set to be 0 for LSLA-2, since the L 2 S 2 = LX 2 S 2 is smooth.For LSLA-1, we test a set of values of δ.It reveals that very small δ would possibly cause numerical issues and leads to poor performance in both Peak Signal to Noise Ratio(PSNR) and Structural SIMilarity index(SSIM).Empirically, a good choice for δ is around 0.1 and we set δ to be 0.1 or 0.12, i.e., δ 2 to be 0.01 or 0.014, depending on the image with the purpose of high PSNR and SSIM.
For the dictionary learning-based part, in our work, the required parameters for this algorithm are the penalty parameter 1/µ, the noise level σ, the patch size of dictionary L and the number of atoms in the dictionary K.The number of atoms is set to be K = 4L 2 , where 4 is a redundancy parameter.The patch sizes in our experiments are 8, 10, 12 and 14.Usually, based on our experience, small patch size would cause an over smooth effect, and a larger patch size would increase the basis, which leads to more computation.Based on our results, finally, we use 12 × 12 patches to balance the two effects.The penalty parameter τ, i.e., 1/µ is related to the noise level σ.In fact, it has been revealed that empirically a good relationship that leads to the best results is τ = 1/µ = 30/σ, i.e., µ = σ/30.Here, it is natural to assume that, with the iteration number of two stages increasing, the level of the remaining noise would decrease.This implies that each time when we apply K-SVD algorithm, the input σ should change to meet the variation of noise level, and thus the best parameter µ should also change with it.We initialize an estimate of noise level σ as input of K-SVD and reduce it by multiplying 0.9 every time after the first stage process and keep µ constant naturally.By our experience, although we do not always satisfy µ = σ/30, the influence is not noticeable.The reasons for this may include the following: (1) the remaining noise is small and the result is not sensitive to parameters of a given set of values in our experiments; (2) the number of alternating steps between the first and second stages is small and thus it avoids the "bad" effects to accumulate to a remarkable extent.Depending on the noise levels and types, we assign σ to be 3, 4, 5 for additive noise with noise level from low to high and 4, 6, 8 for multiplicative noise with level from low to high respectively, which empirically shows to be reasonable.
In our practical optimization method, by introducing the additional terms in Equations ( 12) and ( 13), at each stage, we make use of information from another.Based on this, it is not necessary to process the first and second stages with the same times.Learning based stage would potentially have over smoothing effects and ignore fine details in the resulting image, and, as mentioned before, the first stage would recapture the lost details to ensure the image fine details.Thus, in our experiments, we start with the first stage and end with the first stage.

Performance and Analysis
We use six standard test images, including Face, Kids, Wall, Abdomen, Nimes, and Fields with different noise types and levels to evaluate the performance of our proposed method.Face, Kids, Wall and Abdomen images are used for additive noise experiments; Fields and Nimes are used for multiplicative noise experiments.Face, Kids, and Wall images are of size 255 × 255.Among the rest images, Nimes and Fields are of size 512 × 512 and Abdomen is of size 360 × 540.In our work, we evaluate our performance with two criterion: peak signal-to-noise ratio (PSNR) and structure similarity (SSIM).PSNR is defined as 10 log 10 MSE , where M denotes the maximum intensity of the underlying image and j=1 (X i,j − Xi,j ) 2 is the mean squared error between the denoised image X and the noiseless image X. SSIM is defined as (2µ where µ X , µ X , σ 2 X , σ 2 X , and σ X, X denote the average of X, the average of X, the variance of X and the variance of X, respectively.c 1 and c 2 are two variables to stabilize the division with weak denominator.In Table 2, we provide the additive noise image restoration results in comparison with Block-matching three dimension (BM3D) [10], Hessian Schatten-norm (HS 1 ) [33], Expected Patch Log Likelihood (EPLL) [17], and Total Variation (TV) [34] for a set of four images with different noise levels.In Table 3, we list the results of our proposed method and two other methods including (multiplicative image denoising by augmented Lagrangian (MIDAL) [35] and AA [36] for a set of two images degraded by different levels of multiplicative noise.From Table 2, it is noted that our method has the best performance in PSNR in all cases and 11 out of 12 cases in SSIM with significant improvements.In terms of PSNR, our proposed method usually outperforms BM3D by around 1-2 dB.In addition, it is observed that our methods have improved SSIM by around 0.05 on Wall and Abdomen images.Overall, LSLA-1 and LSLA-2 produce the best results although Figure 1 shows that LSLA-1 has some dark peaks in the uniform regions (liver and kidney).To visually evaluate the performance, we show some visual results.Figures 1-4 show some examples of the resulting images by different methods.Visually, BM3D results in very clean images, but many fine details are eliminated.To better show visual effects of the methods, we enlarge some local patches in Figures 3 and 4. It is observed that indeed BM3D has an over smoothing effect and the major details are missing, whereas our methods keep such details.Moreover, it is observed that the brightness of different regions has changed in the images produced by BM3D, whereas our method shows similar brightness to the original ones.Our method shows similar results to TV and HS 1 with more smoothing effects in the smooth region.For multiplicative noise removal, it is seen that the proposed method has the best performance in all tests, except in one case when compared with MIDAL in SSIM.This observation, again, verifies that the proposed method is indeed effective for both additive and multiplicative noise removal.To visually evalute the results, we show some examples of the resulting images in Figures 5 and 6.It is observed that the proposed method captures the edges well from the images, while MIDAL fails when the noise is strong.In the smooth regions, our method keeps the fine details such as the gradual change well, while MIDAL makes such gradual change difficult to distinguish.In addition, the intensities shown in the denoised images by our method appears to be much closer to the clean ones than AA and MIDAL because the contrast between the darkness and brightness is more like that in the clean images.Such observations have confirmed the effectiveness of the proposed method.At the same time, it can be seen that multiplicative noise is a much harder problem.

Parameter Sensitivity
For image denoising problem, it is hard to learn or theoretically analyze or the optimal parameters.To better investigate the performance of the proposed method, in this subsection, we present how the parameters affect the denoising performance.We have used the combination of parameters {λ 1 , λ 2 } selected from the set {2, 5, 10, 15, 20} × {0.01, 0.05, 0.1, 0.2, 0.3}.Without loss of generality, we test our method on two degraded images and report the results in Figure 7 where all combinations of parameters are used from the above set.It is seen that our method has comparablely high performance with a broad range of parameter values on both images, which implies the insensitivity of our method to the parameters.This observation indeed ensures the potential of our method in real world applications.

Time Comparison and Analysis
In this subsection, we test the time needed for the methods in comparison.Our simulations were performed in a MATLAB R2010b environment (MathWorks, Natick, Massachusetts, USA) on a Windows 7 operating system (Microsoft, Redmond, Washington, USA) with 2.60 GHz CPU and GB RAM.Without loss of generality, we test the methods on images as given in Table 2, where average time costs are reported in Table 4 for each image and method.It is observed that the proposed methods need more time than others except EPLL.Considering that our methods have superior performance in denoising results, such cost in time is fairly acceptable.Moreover, we investigate the time cost for each stage of our algorithm.Without loss of generality, we report the results on Face and Kids images in Figure 8.It is seen that the second stage costs roughly 60% of the overall time on average.As the major step for time cost, it should be noted that this stage involves K-SVD, which generally is time-consuming.In this paper, we aim at proposing a new image denoising method and the way to speed up our method is not in the scope of our current work, which will be considered in the future.

Discussion
From the above experiments, it is seen that the proposed methods have superior performance to state-of-the-art algorithms both quantatively and visually.Quantatively, the proposed methods improve PSNR and SSIM significantly while visually they keep fine details of the images when other methods fail.Though the proposed method has slower speed than BM3D, TV, etc., it is noted that our method is comparable to some state-of-the-art algorithms such as EPLL, yet with superior performance.Hence, it is convincing to claim the stronger applicability of the proposed method to real world applications, such as hyperspectal image denoising, biomedical image denoising, or preprocessing of noisy image data for recognition, etc. Possible reasons for slower speed of the proposed method may be the need of matrix inverse operations and sparse coding, which generally have high cost.It is possible to speed up the proposed method with approximation techniques for matrix inverse or with more efficient sparse coding technique.This may be considered as a further line of research.

Conclusions
This paper presents an image denoising method that can be applied to both additive and multiplicative noise.The proposed method is designed to capture global structures and preserve local similarities simultaneously.This method produces promising results in terms of PSNR, SSIM and visual quality.The advantages of this novel method include the following: (1) for additive noise, our method outperforms or shows comparable results to TV , HS 1 and BM3D methods either in terms of SSIM or PSNR; (2) for multiplicative noise, our method has performance superior to AA and MIDAL algorithms either in SSIM or PSNR; and (3) our method captures structures and keeps fine details well.
There are several future research directions.We are further exploring other optimization strategies for more effective convergence and further improvement.We are also considering transformation based method.Transformation and learning based model might potentially lead to more promising results.

Figure 1 .
Figure 1.Results on the abdomeng image degraded by Gaussian noise of level = 0.1.

Figure 2 .
Figure 2. Results on the kids image degraded by Gaussian noise of level = 0.07.

Figure 3 .
Figure 3. Results on the face image degraded by Gaussian noise of level = 0.05.

Figure 7 .
Figure 7. Example of denoising performance changes with respect to parameters.From left to right are results on images of Face with Gaussian noise of level 0.05 and Kids with Gaussian noise of level 0.04.

Figure 8 .
Figure 8.(a) Face image degraded by Gaussian noise of level 0.05; (b) Kids image degraded by Gaussian noise of level 0.04.Example of time cost on two stages using different combinations of parameters.

Table 1 .
Empirical value of parameters.

Table 2 .
Comparison of different methods, in PSNR and SSIM, for additive noise with different noise levels.

Table 3 .
Comparison of different methods for multiplicative noise with different noise levels.

Table 4 .
Time comparison on Gaussian noise removal.