Improved Image Denoising Algorithm Based on Superpixel Clustering and Sparse Representation

: Good learning image priors from the noise-corrupted images or clean natural images are very important in preserving the local edge and texture regions while denoising images. This paper presents a novel image denoising algorithm based on superpixel clustering and sparse representation, named as the superpixel clustering and sparse representation (SC-SR) algorithm. In contrast to most existing methods, the proposed algorithm further learns image nonlocal self-similarity (NSS) prior with mid-level visual cues via superpixel clustering by the sparse subspace clustering method. As the superpixel edges adhered to the image edges and reﬂected the image structural features, structural and edge priors were considered for a better exploration of the NSS prior. Next, each similar superpixel region was regarded as a searching window to seek the ﬁrst L most similar patches to each local patch within it. For each similar superpixel region, a speciﬁc dictionary was learned to obtain the initial sparse coefﬁcient of each patch. Moreover, to promote the effectiveness of the sparse coefﬁcient for each patch, a weighted sparse coding model was constructed under a constraint of weighted average sparse coefﬁcient of the ﬁrst L most similar patches. Experimental results demonstrated that the proposed algorithm achieved very competitive denoising performance, especially in image edges and ﬁne structure preservation in comparison with state-of-the-art denoising algorithms.


Introduction
As one of the most fundamental low-level vision problems, image denoising has been widely studied in computer vision, serving as the foundation and precondition for image processing, such as visual saliency detection, image segmentation, image classification, etc.In general, image denoising aims at recovering the clean image from the noise-corrupted image while preserving, as much as possible, the vital image features.During the past few decades, image denoising has drawn much research attention, resulting in a variety of efficient methods.Traditional image denoising methods include the median filter, the Gaussian filter, methods based on the total variation, wavelet threshold methods, etc.However, most of these methods frequently ignore the details of image features that include structure, texture, and edge features.To some extent, the ignorance of image details makes these methods suffer from many defects, comprising of over-smoothing, side effect, artifacts, loss of structure and texture features, ambiguousness of edges, etc.
Motivated by the defects in traditional image denoising methods, significant progress has been made in recent years.As image denoising is typically an ill-posed problem, its solution might not be unique.Based on good learning image priors from noise-corrupted images or clean natural images, numerous methods have been proposed to obtain a better solution to the image denoising problem.
In particular, nonlocal self-similarity (NSS) and sparsity are two popular image priors with great potential that lead to state-of-the-art performance.Based on the fact that local image details may appear multiple times across the entire image, NSS prompts a series of excellent algorithms for image denoising.The nonlocal means (NLM) [1] algorithm computed a noise-free pixel as the weighted average of pixels with similar neighborhoods in the fixed-size rectangle searching window, and achieved significant enhancement in denoising performance.Inspired by the success of NLM method, Dabov et al. [2] proposed a remarkable collaborative image denoising scheme, called block-matching and 3D filtering (BM3D).In this scheme, nonlocal similar patches were grouped into a 3D cube and collaborative filtering was conducted in the sparse 3D transform domain.The BM3D algorithm ranks among the best performing methods, yet its implementation is complex.Furthermore, the BM3D algorithm is based on classical fixed orthogonal dictionaries, thus lacking data-adaptability.Building on the principle of sparse and redundant representations [3], another category of methods has been developed, which can learn data-adaptive dictionaries for denoising.The K-SVD algorithm [4] has boosted denoising performance significantly.Mairal et al. [5] proposed the learned simultaneous sparse coding (LSSC) algorithm, which used nonlocal self-similarity (NSS) to improve sparse models with simultaneous sparse coding.In Reference [6], Chatterjee et al. clustered image into K groups to enhance the sparse representation via locally learned dictionaries, which took advantage of geometrical structure feature and the NSS prior in spatial domain.Subsequently, Dong et al. [7] observed that the difference between the representation coefficients of original and degraded images was sparse, and added a restriction to ensure the minimization of the l 1 norm for the difference.Thus, this model achieved good results in image denoising.By assuming that the matrix of nonlocal similar patches had a low-rank structure, the low-rank minimization based methods [8,9] also achieved very competitive denoising results.Zhang et al. [10] later proposed a patch group (PG) based NSS prior learning scheme to learn explicit NSS models from natural images for high performance denoising, resulting in high peak signal to noise ratio (PSNR) measurements.
Though NSS in low-level vision cues has been widely utilized to improve image denoising performance in most existing methods, we argue that such utilizations of NSS are not sufficiently effective.These methods learn NSS prior via clustering local size-fixed patches extracted from an image, which may neglect the image edge and structural features to some extent.Moreover, in most existing methods, the NSS prior is usually exploited by searching for nonlocal similar patches to a local patch across a size-fixed square searching window, which always leads to massive workload and ignores the similarities between pairs of patches with large spatial distances.Since most existing methods do not make full use of NSS prior in low-level vision cues, it is necessary to learn the prior in mid-level vision cues for a better exploration of the NSS prior.
With the above considerations, this paper proposes to further learn NSS in mid-level vision cues via superpixel clustering using the sparse subspace clustering method.Since the superpixel edges adhere to the image edges and reflect the image structural features, structural and edge information can be considered for a better exploration of the NSS prior by superpixel clustering.Furthermore, this paper proposes an improved algorithm for image denoising by taking advantage of multiple priors to achieve a better denoising performance, including the NSS prior in the spatial domain and sparse transform domain, sparsity, structure, and edge prior.In the proposed algorithm, we first divided the image into multiple superpixels by the simple linear iteration clustering (SLIC) method and grouped superpixels to generate irregular regions by the sparse subspace clustering method with local features.Regarding these regions as searching windows, we sought the first L most similar patches to each local patch.Next, a data-adaptability dictionary for each region was learned to obtain the initial sparse coefficients of local patches extracted from the images.Finally, to improve the effectiveness of the sparse coefficient for each patch, a weighted sparse coding model was constructed by adding a weighted average sparse coefficient of the first L most similar patches to the sparse representation model.Once final sparse coefficients for all patches were acquired, the noise-free image was obtained.Benefiting from two factors, the proposed algorithm achieves enhanced image denoising performance.
First, learning the NSS in mid-level vision cues via superpixel clustering promotes a better exploitation of the NSS prior.Furthermore, regarding similar superpixel regions as a searching window can avoid the ignorance of similarities between pairs of patches with large spatial distances, which also contributes to the better exploitation of the NSS prior.Second, a weighted sparse coding model was established, which reduced the impact of noise on sparse representation and improved the accuracy of sparse coefficient for each patch.
The rest of this paper is organized as follows.In Section 2, the basics of the proposed algorithm are described, including the SLIC algorithm, sparse subspace clustering algorithm and sparse representation algorithm.In Section 3, we introduce the proposed denoising model in detail.Next, experimental results and discussion are presented in Section 4. Finally, we make our conclusions in Section 5.

Basics of Superpixel Clustering and Sparse Representation (SC-SR) Algorithm
This paper implements a novel image denoising algorithm based on the NSS prior in mid-level vision cues and weighted sparse coding.Before analyzing the proposed denoising algorithm, it is essential to introduce three basic algorithms which play vital roles in realizing the proposed algorithm.

Simple Linear Iterative Clustering Algorithm for Segmentation
this paper, a simple linear iterative clustering (SLIC) algorithm was selected to generate compact and nearly uniform superpixels, since it has better performance in running speed, superpixel compactness and contour preservation in comparison with other methods [11].SLIC generates superpixels by clustering pixels based on their similarity in color and proximity in the image plane [12].This method seamlessly applies to color as well as grayscale images via replacing color similarity with gray information similarity.In this paper, experiments were conducted on grayscale images.
SLIC is essentially a local k-means clustering method.Given the total number of image pixels N p and the desired number of superpixels N s , cluster centers are sampled at a regular grid S = N p /N s .To speed up the generation of superpixels, SLIC assigns each pixel p i to the nearest cluster centers within the local region of the pixel p i rather than the whole image plane.The size of the local region is 2S × 2S that takes the pixel p i as the center.
For grayscale images, gray and space information are taken into consideration for describing a pixel as a vector.Instead of using a simple Euclidean norm, the distance measure D s is defined as follows: where d g is the gray distance, and d xy is the plane distance normalized by the grid interval S. A variable m is introduced to control the compactness of a superpixel.The greater the value of m, the more spatial is emphasized and the more compact the cluster, and we empirically set m to 10 [12].The values of d g and d xy are obtained as follows: where g i and g j are the gray values of pixel i and j; x i and x j are the x-coordinate values of pixel i and j; and y i and y j are the y-coordinate values of pixel i and j.
Given the desired number of superpixels N s , SLIC begins by sampling M regularly spaced cluster centers, which are denoted by feature vectors . To avoid placing a cluster center at an edge pixel or a noisy pixel, the cluster center is moved to the lowest gradient position in a 3 × 3 neighborhood.Next, each pixel in the image is associated with the nearest cluster center within the local area of the pixel by adopting k-means clustering method.
The segmentation results of SLIC on some benchmark test images are displayed in Figure 1.As shown, SLIC generates compact and nearly uniform superpixels and achieves a good description of the image edges.The segmentation results of SLIC on some benchmark test images are displayed in Figure 1.As shown, SLIC generates compact and nearly uniform superpixels and achieves a good description of the image edges.

Sparse Subspace Clustering for Noisy Data
Sparse subspace clustering is an outstanding clustering method based on sparse representation, which is able to handle missing or corrupted data.It is based on the fact that each data point in a union of subspaces can be represented as a linear or affine combination of other points.Points are determined to lie in the same subspace by searching for the sparsest combination, which leads to a sparse similarity matrix [13].Based on the similarity matrix, spectral clustering is adopted to obtain the final clustering result.
Given a data matrix = [ , , … , ], the sparse representation of each point ∈ can be obtained by the following optimal problem: ) is defined as the matrix obtained from by removing its i-th column, where the subscript ̂ means that there is no .Point has a sparse representation with respect to the matrix ̂.Next, the optimal problem is equal to the following problem: where is a unit vector.Taking noise into consideration, let = + be the i-th corrupted data point, where ‖ ‖ ≤ , and are the noise variance.The optimal problem turns to: Lasso optimization method [14] is used for recovering the optimal sparse solution from the above equation.With the sparsest representation of each point, a coefficient matrix ( = [ , , … , ]) can be obtained, which describes the connections between points.Taking the matrix to establish a directed graph = ( , ), the vertices of the graph are the data points and the adjacency matrix is the coefficient matrix .Moreover, to make balanced, a new adjacency matrix ( = + ) is built.Subsequently, the Laplacian matrix is formed by = − , where ∈ × is a diagonal matrix and meets the condition of = ∑ .Finally, spectral clustering method is applied to cluster the eigenvector of the Laplacian matrix to obtain the final cluster result for the whole data points.
In this paper, sparse subspace clustering is adopted to effectively cluster superpixels into regions of similar geometric structure aiming to further learn the NSS prior in mid-level vision cues even in the presence of noise.It is known that better learning of the NSS prior can improve the performance of denoising algorithms based on structure clustering and sparse representation.Instead of clustering size-fixed patches that are inflexibly extracted from images, clustering superpixels takes image edge and structure features into consideration, which leads to a better learning result of the NSS prior.

Sparse Subspace Clustering for Noisy Data
Sparse subspace clustering is an outstanding clustering method based on sparse representation, which is able to handle missing or corrupted data.It is based on the fact that each data point in a union of subspaces can be represented as a linear or affine combination of other points.Points are determined to lie in the same subspace by searching for the sparsest combination, which leads to a sparse similarity matrix [13].Based on the similarity matrix, spectral clustering is adopted to obtain the final clustering result.
Given a data matrix M = [m 1 , m 2 , . . . ,m n ], the sparse representation of each point m i ∈ R d can be obtained by the following optimal problem: Matrix M î ∈ R d×(n−1) is defined as the matrix obtained from M by removing its i-th column, where the subscript î means that there is no i.Point m i has a sparse representation with respect to the matrix M î.Next, the optimal problem is equal to the following problem: where I is a unit vector.Taking noise into consideration, let m i = m i + η i .be the i-th corrupted data point, where ||η i || 2 ≤ ε and ε are the noise variance.The optimal problem turns to: Lasso optimization method [14] is used for recovering the optimal sparse solution from the above equation.With the sparsest representation of each point, a coefficient matrix C (C = [c 1 , c 2 , . . . ,c n ] ) can be obtained, which describes the connections between points.Taking the matrix C to establish a directed graph G = (V, E), the vertices of the graph V are the n data points and the adjacency matrix is the coefficient matrix C.Moreover, to make G balanced, a new adjacency matrix C ( C ij = C ij + C ji ) is built.Subsequently, the Laplacian matrix L is formed by L = Da − C, where Da ∈ R N×N is a diagonal matrix and meets the condition of Da ii = ∑ j C ij .Finally, spectral clustering method is applied to cluster the eigenvector of the Laplacian matrix to obtain the final cluster result for the whole data points.
In this paper, sparse subspace clustering is adopted to effectively cluster superpixels into regions of similar geometric structure aiming to further learn the NSS prior in mid-level vision cues even in the presence of noise.It is known that better learning of the NSS prior can improve the performance of denoising algorithms based on structure clustering and sparse representation.Instead of clustering size-fixed patches that are inflexibly extracted from images, clustering superpixels takes image edge and structure features into consideration, which leads to a better learning result of the NSS prior.

Sparse Representation Based Image Denoising
In general, sparse representation works in a patch-based framework, where an image is represented by sparse coefficients of its overlapping patches.Next, the recovery of image is achieved by averaging the sparse representation of all overlapping patches.
Given an image Y with noise, the column vector of b × b patch at the location of i, is denoted as y i = R i Y, where R i is a binary matrix aiming to extract an overlapping patch and convert it to a column vector.The sparse coefficient αi of a patch is realized by the optimal solution to the following equation: where p = 0, 1, D is known as dictionary, and ζ is a regularization parameter to keep a balance between sparsity and reconstruction error.Once αi is found, the denoising result y i of the image patch y i can be computed by y i = Dα i .A challenging issue in finding the sparse coefficient αi is to choose the dictionary.In brief, a dictionary is a matrix, which is usually obtained by a specific transformation or learned from a large set of clean patches or noisy patches.When the dictionary and sparse coefficients are obtained, the denoised image Y can be reconstructed by aggregating all of the sparse representation of patches as follows [9]: Nonetheless, only the local sparse representation model is employed in denoising problem, which may not lead to a preferable enough solution.Herein, this paper combines good image priors from the noise-corrupted image with sparse representation for a better solution to image denoising problem.The NSS prior in the spatial domain was learned by superpixel clustering, which also generated irregular regions consisting of similar superpixels.By regarding these regions as searching windows, similar patches to a local patch within each region were found.The NSS prior in sparse transform domain utilized a constraint that the weighted average of sparse coefficients for the acquired similar patches of each local patch constrains the sparse coefficient of the local patch to an optimal solution.The NSS priors in both the spatial domain and sparse transform domain were added into the sparse representation model to enhance the image denoising performance of our proposed algorithm.Further details of the proposed algorithm are explained in the next section.

Proposed SC-SR Algorithm
In this section, the image denoising model based on superpixel clustering and sparse representation is presented.We first utilized the SLIC algorithm to generate M superpixels that include similar pixels.Next, the NSS prior was further exploited by making use of the sparse subspace clustering algorithm with local features to cluster similar superpixels into a group.In the abnormal region of similar superpixels group, we extracted overlapping patches for training dictionary and sought similar patches to each local patch.Finally, a weighted sparse coding was adopted for better sparse representation.The following is a detailed introduction on the proposed algorithm.

Superpixels Clustering
the outset, SLIC is used to generate M superpixels, and a superpixel is described as a column vector u with several features.Each image pixel within a superpixel can be represented by a seven-dimensional feature vector f: where g is the gray value of the pixel; I X , I Y , I XX , I YY are the corresponding first or second order derivatives of image intensities in both X and Y axes; and x, y are the coordinates of the pixel in an image.The parameter β aims to make a balance among image gray, gradient, and spatial features.If we use equal weight (β = 1) for the spatial feature, similarities between pairs of patches with large spatial distances may be lost.Thus, we empirically chose β = 0.5 to alleviate this problem [15].In addition, the image spatial, gray and gradient values were normalized.For a given superpixel, we computed the mean vector u of all pixels within it as its feature vector: where Γ is the size of the superpixel and f j indicates a vector of a pixel within the superpixel.
We considered these N s superpixels as a collection of data points drawn from a union of K independent affine subspaces.The sparse subspace clustering method was adopted to cluster the collection of data points into K groups.After transforming every superpixel into a column vector, the collection of data points U = {u i } M i=1 for clustering was obtained.For each superpixel, its covariance matrix Mc was calculated by: where f k i indicates the i-th feature of the k-th pixel of the current superpixel and u i indicates the i-th element of the feature vector of the superpixel.
For two superpixels, their similarity can be computed based on the corresponding covariance matrices, Mc 1 and Mc 2 : where WM(c 1 , Mc 2 ) is the similarity of the two superpixels; and ρ is a small constant (we experimentally set ρ to 0.5 in our experiment) [15].
To better characterize the relationship between the superpixels and alleviate the sensitivity of sparse coding to noise, a Laplacian regularization term [15] based on the similarity matrix W was introduced into Equation ( 6) to ensure the similarity of sparse coefficients among similar superpixels.Next, Equation ( 6) is equal to the following problem: where L is the Laplacian matrix defined as L = D a − W, and D a is the diagonal matrix with row sums, The parameter γ is the weighted parameter that balances the effect of the Laplacian regularization term (γ was empirically set to 0.2 in our experiments) [15].By solving Equation ( 15), a sparse coefficients matrix C = [ ĉ1 , ĉ2 , . . . ,ĉ M ] was obtained, which was not symmetric.Therefore, we updated it by C ij = C ji = C ij + C ji to make it symmetric.A directed graph G = (V, E) was built by utilizing the sparse coefficients matrix C. The vertices of the graph V were the data set {u i } M i=1 , and the new adjacent matrix was A = H − C, where H ii = ∑ j C ij The spectral clustering algorithm was used to segment the graph G to obtain the result of clustering superpixels.
Superpixel clustering learned NSS prior, which contributed to the preservation of structural information of the denoised image.When the dictionary was learned, the learned NSS prior obtained by superpixels clustering added the structural feature and edge feature into the atoms of the dictionary.The richer the features of the dictionary, the stronger the ability to reconstruct original image.

Learning Sub-Dictionaries for Each Cluster of Superpixels
In the similar superpixel regions, we extracted overlapping patches centering on each pixels, where the overlapping patches were all b × b size (b is odd).As for the pixels on the boundary of the image, we extended them by b/2 pixels in a horizontal and vertical direction by mirroring pixels to gain the patches centering on them.As shown in Figure 2, the matrix Ma 1 was extended by two elements in a horizontal and vertical direction by mirroring extension, and the matrix Ma 2 was obtained.After every patch was transformed into a column vector, K sub-datasets {M k } K k=1 were formed for training sub-dictionaries.
the co-variance matrix of is denoted by .We obtained the result of matrix factorization = , where = [ , , … , ] is the orthogonal transformation matrix and is a diagonal matrix taking eigenvalues of as its diagonal elements.If we regard as the dictionary and set = , we obtain ‖ − ‖ = − = 0 .Therefore, Equation ( 16) is only determined by the sparsity regularization term ‖ ‖ , which is constant in this case.To obtain better sub-dictionaries for sparse representation, we extracted the first ∈ [1, ] most important eigenvectors in to form a dictionary , = [ , , … , ] , instead of regarding as the dictionary.The sparse coefficient matrix is denoted by .It is known that ‖ ‖ increases when increases, while − decreases.The optimal solution can be obtained by solving the following problem: Consequently, we obtained sub-dictionaries { } for the sub-datasets { } .In Equation (17), it was verified that some noise can be successfully removed during computing the local PCA transform of each image patch.In this paper, PCA was applied to each sub-dataset to construct the dictionary, which was able to reduce not only the computational cost consumed by dictionary training, but also the noise introduced into the dictionary.

Sparse Representation Model for Image Denoising
It was addressed that similar patches shared the same dictionary elements in their sparse decomposition [5].In other words, there was NSS in the sparse transform domain as well.The proposed algorithm takes advantage of that fact to achieve a better sparse representation and improve the performance of image denoising.Since the number of patches in each cluster was limited and patches in M k had similar patterns, it was not necessary to learn an over-complete dictionary for each cluster [16].Therefore, we used the principal component analysis (PCA) method to learn the compact sub-dictionary for each cluster.
For each sub-dataset M k , the proposed algorithm applied PCA to compute the principal components, with the purpose of constructing the sub-dictionaries {D k } K k=1 .D k can be constructed by the optimal solution for the following formulation: where A k is the sparse coefficient matrix of M k over D k The rank of M k is denoted by r and the co-variance matrix of M k is denoted by ψ k We obtained the result of matrix factorization ψ k = P T k Λ k P k , where P k = [p 1 , p 2 , . . . ,p r ] is the orthogonal transformation matrix and Λ k is a diagonal matrix taking r eigenvalues of ψ k as its diagonal elements.If we regard P k as the dictionary and set 16) is only determined by the sparsity regularization term E k1 , which is constant in this case.To obtain better sub-dictionaries for sparse representation, we extracted the first τ ∈ [1, r] most important eigenvectors in P k to form a dictionary D τ k , D τ k = [p 1 , p 2 , . . . ,p τ ], instead of regarding P k as the dictionary.The sparse coefficient matrix is denoted by decreases.The optimal solution can be obtained by solving the following problem: Consequently, we obtained K sub-dictionaries {D k } K k=1 for the K sub-datasets {M k } K k=1 .In Equation (17), it was verified that some noise can be successfully removed during computing the local PCA transform of each image patch.In this paper, PCA was applied to each sub-dataset to construct the dictionary, which was able to reduce not only the computational cost consumed by dictionary training, but also the noise introduced into the dictionary.

Sparse Representation Model for Image Denoising
It was addressed that similar patches shared the same dictionary elements in their sparse decomposition [5].In other words, there was NSS in the sparse transform domain as well.The proposed algorithm takes advantage of that fact to achieve a better sparse representation and improve the performance of image denoising.
In the previous subsection, we learned a PCA dictionary for each cluster.Given the dictionaries, the initial sparse coefficients α (0) i of all patches within each cluster are found by solving the minimization problem of Equation (7) in the condition of p = 0 It is known that 0 pseudo norm regularization term has usually better reconstruction performance than 1 pseudo norm regularization term [17].Since 0 minimization problem is NP-hard, greedy approaches are employed to achieve an approximate solution.One of the most widely used greedy approach is the orthogonal matching pursuit (OMP) that successively selects the best atom of dictionary minimizing the representation error until a stopping criterion is satisfied.In our algorithm, we employed a modified version of OMP, referred to as generalized OMP (GOMP) [18], to obtain the initial sparse coefficients α (0) i , which allowed the selection of multiple atoms per iteration.Simultaneous selection of multiple atoms reduced the number of iterations and achieved better sparse representation.
After obtaining the initial sparse coefficients α (0) i , we exploited the NSS prior in the sparse transform domain to produce a better estimate of the original image.To begin, we looked for the first L most similar patches to each patch across the region of similar superpixels.Each pixel in a patch was denoted as a column vector that is computed by Equation ( 9), and the patch with the size of b × b was described as a matrix The similarity between the two patches was measured by Equation ( 13) with their covariance matrixes.After obtaining the patches similar to each local patch, we computed the weighted average of initial sparse coefficients associated with the similar patches as χ i Next, χ i was exploited to constrain the sparse coefficient of the local patch to reduce the impact of noise on sparse representation and achieve a better solution.A better sparse representation coefficient for each patch y i can be found by: αi = argmin where χ i is the weighted average of sparse coefficients for the first L most similar patches to the patch y i and η is a regularization parameter making a balance between sparsity and reconstruction error.
The weight w ij of the two patches was computed as follows based on their covariance matrixes Cov i and Cov j : Based on the iterative soft-thresholding (IST) method [19], the solution to Equation ( 18) was computed by: where S τ is a soft-thresholding function; t denotes the iteration frequency; and c is a constant to guarantee the strict convexity of the optimal problem and D T D < c [20].After J iterations, we can get preferable sparse coefficients.All patches can be estimated by ŷi = Dα (J) i .Since every pixel admits multiple estimations, its value can be computed by averaging all estimations.When the sparse Appl.Sci.2017, 7, 436 9 of 21 coefficients for all patches within the image were obtained, the whole original image x was obtained by Equation (8).By using the weighted average sparse coefficients of the similar patches to restrain the sparse decomposition process of the image patches, the accuracy of the sparse coefficients was is improved.As a result, the reconstructed image was closer to the original image.
The proposed algorithm is completely demonstrated in Algorithm 1.It makes full use of the NSS prior both in the spatial domain and sparse transform domains.Clustering in the spatial domain with mid-level visual cues could better exploit image structure and edges prior in comparison with directly clustering the size-fixed local patches, which also contributes to the preservation of image edges and textures.As for the sparse transform domain, to achieve a better solution, the weighted average of the sparse coefficients for the patches similar to a local patch was utilized to constrain the sparse coefficient for the local patch.These are all devoted to the high denoising performance of the proposed algorithm.Input image Y with white Gaussian noise.

2.
Set parameters: noise variance δ, superpixles number N s , cluster number K, the patch size b × b, the number L of the first most similar patches, regularity parameters β, ρ, u, γ, λ η, c, J.

3.
Adopt SLIC to generate N s superpixles.

4.
Utilize sparse subspace clustering method to group superpixels into K cluster to form sub-datasets {M k } K k=1 .

5.
Outer loop: For k = 1: K 1 Given sub-dataset M k , train a dictionary by PCA. 2 Initialize the sparse coefficients α (0) i for each patch over its specific dictionary by GOMP. 3 Inner loop: For t = 1: J Seek for the first L most similar patches in the cluster for each patch, compute the weighted average of sparse coefficients for the acquired similarity patches and update the sparse coefficients α (t) i = αi for the patch by Equations ( 18) and (19).
After J iterations, obtain the final sparse coefficients α (J) i and the sparse representation i for all patches.

6.
Reconstruct the image, and output the denoised image Y.

Experimental Results
In this section, we validate the performance of the proposed algorithm by conducting extensive experiments on 10 standard benchmark images shown in Figure 3.In our experiment, we first added synthetic white Gaussian noise with different variances into the test images.Then the proposed algorithm and four currently state-of-the-art denoising algorithms, including NLM, K-SVD, BM3D, expected patch log likelihood (EPLL) [21] algorithms, were used to denoise the test images.Finally, we compared the proposed algorithm with the fr state-of-the-art algorithms in terms of peak signal to noise ratio (PSNR), structural similarity (SSIM) [22], figure of merit (FOM) and visual quality.
In this section, we validate the performance of the proposed algorithm by conducting extensive experiments on 10 standard benchmark images shown in Figure 3.In our experiment, we first added synthetic white Gaussian noise with different variances into the test images.Then the proposed algorithm and four currently state-of-the-art denoising algorithms, including NLM, K-SVD, BM3D, expected patch log likelihood (EPLL) [21] algorithms, were used to denoise the test images.Finally, we compared the proposed algorithm with the four state-of-the-art algorithms in terms of peak signal to noise ratio (PSNR), structural similarity (SSIM) [22], figure of merit (FOM) and visual quality.

Parameters Setting
The parameters set in our experiment were as follows: the superpixels number N s was set to 500, the cluster number K was set to 60, the patch size b × b was set to 7 × 7, noise variance δ was in the range of [5, 15, 25, 40, 60, 80], and the number of similar patches L was set to 10.The iteration number J was set based on the noise level, and we required more iterations for a higher noise level.From experience, we set the iteration number J to 7, 9, 13 and 16 for δ ≤ 15, 15 < δ ≤ 30, 30 < δ ≤ 60 and δ ≥ 60, respectively.Other regularity parameters were all empirical values as well, where β was set to 0.5, ρ was set to 0.5, µ was set to 0.01, γ was set to 0.2, λ was set to 0.03, η was set to 0.3 [15,16].
To verify the influence of image patch size b × b on peak signal to noise ratio (PSNR), structural similarity (SSIM), figure of merit (FOM), 100 test images were selected to calculate the average PSNR, average SSIM and average FOM with different patch size, when noise variance δ was set to 15. Figure 4 shows the changing trend of the average PSNR, average SSIM and average FOM over the image patch size.It was evident that when the patch size was equal to 7 × 7, the average PSNR, average SSIM and average FOM achieved their maximum values.The influence of the cluster number K on PSNR, SSIM and FOM was tested in the same way, and is shown in Figure 5.When the cluster number was equal to 60, the average PSNR and average FOM achieved their maximum values, and the average SSIM obtained its maximum values when the cluster number equaled to 100.To compromise, we set the cluster number K to 60 for an optimal solution.

Parameters Setting
The parameters set in our experiment were as follows: the superpixels number was set to 500, the cluster number was set to 60, the patch size × was set to 7 × 7, noise variance was in the range of [5,15,25,40,60,80], and the number of similar patches was set to 10.The iteration number was set based on the noise level, and we required more iterations for a higher noise level.From experience, we set the iteration number to 7, 9, 13 and 16 for ≤ 15, 15 < δ ≤ 30, 30 < δ ≤ 60 and δ 60, respectively.Other regularity parameters were all empirical values as well, where β was set to 0.5, was set to 0.5, was set to 0.01, was set to 0.2, was set to 0.03, was set to 0.3 [15,16].To verify the influence of image patch size × on peak signal to noise ratio (PSNR), structural similarity (SSIM), figure of merit (FOM), 100 test images were selected to calculate the average PSNR, average SSIM and average FOM with different patch size, when noise variance was set to 15. Figure 4 shows the changing trend of the average PSNR, average SSIM and average FOM over the image patch size.It was evident that when the patch size was equal to 7 × 7, the average PSNR, average SSIM and average FOM achieved their maximum values.The influence of the cluster number on PSNR, SSIM and FOM was tested in the same way, and is shown in Figure 5.When the cluster number was equal to 60, the average PSNR and average FOM achieved their maximum values, and the average SSIM obtained its maximum values when the cluster number equaled to 100.To compromise, we set the cluster number to 60 for an optimal solution.

Qualitative Comparisons
Considering that human subjects are the ultimate judges of image quality, the visual quality of the denoised images is critical when evaluating a denoising algorithm.Figure 6

Qualitative Comparisons
Considering that human subjects are the ultimate judges of image quality, the visual quality of the denoised images is critical when evaluating a denoising algorithm.Figure 6 shows the noise-corrupted images of Monarch, Airplane, Lena and Baboon, whose noise variances were 25, 25, 60 and 60, respectively.Figures 7-10 show the denoised images of Monarch, Airplane, Lena and Baboon disposed by competing algorithms.BM3D and NLM tended to over-smooth the image, while K-SVD BM3D and EPLL were likely to generate artifacts when noise was high.Due to the learned NSS prior by superpixel clustering, the proposed algorithm was more robust against artifacts, and preserved the edge and texture areas better than the other algorithms.For example, in the Monarch image, the SC-SR preserved the edges of the veins on the butterfly's wings much better than the other algorithms.In the Airplane image, the SC-SR reconstructed the English alphabet on the wing of the aircraft more clearly than the other algorithms.In the Lena image, the SC-SR recovered more textures and edges on the hat than the other algorithms.In the Baboon image, the SC-SR preserved more fine texture of the hair of the baboon than other competing algorithms.

Quantitative Comparisons
To further validate the denoising capability of the proposed algorithm, we selected PSNR, SSIM and FOM as indexes to quantitatively evaluate the performance of the SC-SR algorithm.PSNR is one of the most widely used image objective evaluation indexes, and is able to measure the

Quantitative Comparisons
To further validate the denoising capability of the proposed algorithm, we selected PSNR, SSIM and FOM as indexes to quantitatively evaluate the performance of the SC-SR algorithm.PSNR is one of the most widely used image objective evaluation indexes, and is able to measure the similarity of grayscale information between the original image and the denoised image.Since PSNR is based only on the error between the corresponding pixels, it cannot comprehensively describe structural similarity and the degree of edge preservation.SSIM is capable of assessing structural similarity, and FOM can be used to measure the degree of edge preservation between the original image and the denoised image.Table 1 presents PSNR, SSIM and FOM results for different algorithms, images, and noise variances.As presented in Table 1, the top value is the PSNR result, the middle value is the SSIM result, and the bottom value is the FOM result in every table cell.From Table 1, we could observe three points.Firstly, the proposed SC-SR algorithm achieved much better PSNR, SSIM and FOM results than NLM and K-SVD in all cases.Secondly, SC-SR had higher PSNR and SSIM values than EPLL in most cases.Moreover, EPLL acquired the best FOM results among the five algorithms, and SC-SR was only slightly inferior to EPLL.Thirdly, SC-SR obtained better SSIM and FOM results than BM3D in most cases.Meanwhile, when the noise variance was low, the PSNR results of SC-SR were close to BM3D; when the noise variance was high, the PSNR results of SC-SR were obviously better than BM3D, since BM3D trended to suffer from artifacts in this case.According to the these points, we can come to the conclusion that SC-SR is capable of stronger comprehensive ability in reservation of structural, edge and grayscale information and does better in denoising images in comparison with the other algorithms.In order to further testify the conclusions, we made a mean processing for the data results in Table 1, and showed the result in Figure 11.In Figure 11, we demonstrated the total averages of the PSNR, SSIM and FOM for NLM, K-SVD, BM3D, EPLL and SC-SR.For each algorithm, the total average PSNR was calculated by the mean of the average PSNR results with different noise variances (Table 1), and the total average SSIM and the total average FOM were calculated in the same way.As seen in Figure 11, it was obvious that SC-SR achieved the highest total average PSNR and total average SSIM, while EPLL attained the highest total average FOM.SC-SR was close to EPLL and higher than BM3D for the total average FOM.BM3D had a similar total average PSNR as SC-SR, but a lower total average SSIM and total average FOM than SC-SR.EPLL had higher total average FOM, but lower total average PSNR and total average SSIM than SC-SR and BM3D.In brief, among the five algorithms, BM3D possessed the best capacity for removing noise and preserving structural information, and the second-best capacity for preserving edge areas.In general, SC-SR could not only effectively remove the noise, but also preserve the image edge regions and structural information in the round.

Woman
All experiments were run under the MATLAB 2014a environment on a machine with Intel(R) Xeon(R) E5-2690 CPU of 2.60 GHz and 96.0 GB RAM.Owing to compiled C++ mex-function and parallelization implementation, BM3D proved to be the fastest algorithm.Furthermore, NLM benefited from compiled C++ mex-function, and turned into the second-fastest algorithm.Other algorithms suffered from high computational cost based on their computation complexity, as well as their implementation, which simply uses C language with MATLAB.The test revealed that EPLL was about two times slower than K-SVD, and SC-SR suffered from slightly higher computational costs than EPLL due to the involvement of several subtasks and iterative shrinkage operations.However, several accelerating techniques, such as the accelerating techniques described in References [23], could be used to accelerate the convergence of the proposed algorithm.Additionally, the compiled C++ mex-function and parallelization implementation could be adopted to dispose of multiple subtasks to improve the speed of the proposed algorithm.Hence, the computational costs of the proposed algorithm can be further reduced.In Figure 11, we demonstrated the total averages of the PSNR, SSIM and FOM for NLM, K-SVD, BM3D, EPLL and SC-SR.For each algorithm, the total average PSNR was calculated by the mean of the average PSNR results with different noise variances (Table 1), and the total average SSIM and the total average FOM were calculated in the same way.As seen in Figure 11, it was obvious that SC-SR achieved the highest total average PSNR and total average SSIM, while EPLL attained the highest total average FOM.SC-SR was close to EPLL and higher than BM3D for the total average FOM.BM3D had a similar total average PSNR as SC-SR, but a lower total average SSIM and total average FOM than SC-SR.EPLL had higher total average FOM, but lower total average PSNR and total average SSIM than SC-SR and BM3D.In brief, among the five algorithms, BM3D possessed the best capacity for removing noise and preserving structural information, and the second-best capacity for preserving edge areas.In general, SC-SR could not only effectively remove the noise, but also preserve the image edge regions and structural information in the round.
All experiments were run under the MATLAB 2014a environment on a machine with Intel(R) Xeon(R) E5-2690 CPU of 2.60 GHz and 96.0 GB RAM.Owing to compiled C++ mex-function and parallelization implementation, BM3D proved to be the fastest algorithm.Furthermore, NLM benefited from compiled C++ mex-function, and turned into the second-fastest algorithm.Other algorithms suffered from high computational cost based on their computation complexity, as well as their implementation, which simply uses C language with MATLAB.The test revealed that EPLL was about two times slower than K-SVD, and SC-SR suffered from slightly higher computational costs than EPLL due to the involvement of several subtasks and iterative shrinkage operations.However, several accelerating techniques, such as the accelerating techniques described in References [23], could be used to accelerate the convergence of the proposed algorithm.Additionally, the compiled C++ mex-function and parallelization implementation could be adopted to dispose of multiple subtasks to improve the speed of the proposed algorithm.Hence, the computational costs of the proposed algorithm can be further reduced.

Conclusions
In this paper, we presented a new image denoising algorithm which made full use of image priors, including the NSS prior in the spatial domain and sparse transform domain, edges and structural information, and sparsity.It was accomplished in two successive steps based on superpixel clustering and sparse representation.First, we learned the NSS prior in mid-level vision by clustering superpixels.Since superpixel clustering takes edge and structural information into account, a better exploitation of the NSS prior could be obtained.Meanwhile, multiple features were selected to describe a superpixel in the clustering process, which also facilitated a good NSS prior.Second, we took advantage of the NSS prior in the sparse transform domain by using a weighted average sparse coefficient from similar patches to improve the effectiveness of the sparse coefficient for each patch.Experiments conducted on a collection of standard test images demonstrated that the proposed algorithm not only effectively removed the noise, but also provided a better restoration of both the structural information and the edge region algorithms, and overall produced less visual artifacts than other competing algorithms.

Algorithm 1 .
The proposed algorithm called SC-SR 1.

Figure 4 .Figure 4 .
Figure 4.The impact of patch size on average PSNR (peak signal to noise ratio), average SSIM (structural similarity), and average FOM (figure of merit) of SC-SR.

Figure 4 .
Figure 4.The impact of patch size on average PSNR (peak signal to noise ratio), average SSIM (structural similarity), and average FOM (figure of merit) of SC-SR.

Figure 5 .
Figure 5.The impact of cluster number on average PSNR, average SSIM, and average FOM of SC-SR.

Figure 5 .
Figure 5.The impact of cluster number on average PSNR, average SSIM, and average FOM of SC-SR.
Appl.Sci.2017, 7, 436 11 of 20 60 and 60, respectively.Figures 7-10 show the denoised images of Monarch, Airplane, Lena and Baboon disposed by competing algorithms.BM3D and NLM tended to over-smooth the image, while K-SVD BM3D and EPLL were likely to generate artifacts when noise was high.Due to the learned NSS prior by superpixel clustering, the proposed algorithm was more robust against artifacts, and preserved the edge and texture areas better than the other algorithms.For example, in the Monarch image, the SC-SR preserved the edges of the veins on the butterfly's wings much better than the other algorithms.In the Airplane image, the SC-SR reconstructed the English alphabet on the wing of the aircraft more clearly than the other algorithms.In the Lena image, the SC-SR recovered more textures and edges on the hat than the other algorithms.In the Baboon image, the SC-SR preserved more fine texture of the hair of the baboon than other competing algorithms.

Figure 11 .
Figure 11.Comparison of the total average PSNR, total average SSIM and total average FOM for different denoising algorithms.

Table 1 .
The PSNR, SSIM and FOM results for different denoising algorithms.Best results are in bold.
Comparison of the total average PSNR, total average SSIM and total average FOM for different denoising algorithms.