Impulse Noise Denoising Using Total Variation with Overlapping Group Sparsity and Lp-Pseudo-Norm Shrinkage

Featured Application: This paper proposes a new model for restoring polluted images under impulse noise, which makes a contribution to the research in image processing and image reconstruction. Abstract: Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with impulse noise, to suppress the staircase effect of the TV model and enhance the dissimilarity between smooth and edge regions. In the traditional TV model, the L1-norm is always used to describe the statistics characteristic of impulse noise. In this paper, the Lp-pseudo-norm regularization term is employed here to replace the L1-norm. The new model introduces another degree of freedom, which better describes the sparsity of the image and improves the denoising result. Under the accelerated alternating direction method of multipliers (ADMM) framework, Fourier transform technology is introduced to transform the matrix operation from the spatial domain to the frequency domain, which improves the efﬁciency of the algorithm. Our model concerns the sparsity of the difference domain in the image: the neighborhood difference of each point is fully utilized to augment the difference between the smooth and edge regions. Experimental results show that the peak signal-to-noise ratio, the structural similarity, the visual effect, and the computational efﬁciency of this new model are improved compared with state-of-the-art denoising methods.


Introduction
Image denoising is one of the most important research areas in the field of image processing, and it has great value in both theoretical studies and engineering applications.Its usage spans the broad fields of image restoration [1], detection [2], photoelectric detection [3], geological exploration [4], remote sensing [5], and medical image analysis [6], among others [7,8].With the development of compressed sensing theory, image processing algorithms based on sparse representation and constrained regularization have evolved into promising methods of image restoration [9].Models based on total variation (TV) regularization [10][11][12] are found to be effective in removing random noise.The TV model has been successfully used in image restoration tasks such as denoising [13], deblurring [14], and super-resolution [15].Although TV regularization can recover sharp edges of a degraded image, it also leads to some undesired effects and transforms the smooth signal into piecewise constants, the so-called staircase effect.Several models have been proposed by scholars to make an improvement on the TV model [16][17][18][19][20][21].One usual method is to replace the original TV norm by a high-order TV norm.The high-order TV overcomes the staircase effect while preserving the edges in the restored image.However, the high-order TV based methods may transform the smooth signal to over-smoothing and take more time to compute.More details can be referred to in References [22].In 2010, Bredies et al. proposed the total generalized variation (TGV) model [19].The TGV model puts constraints on both the first-and second-order gradients of an image, thus effectively attenuating the staircase effect of the TV model.Still, it is difficult to both preserve the image details and suppress the noise simultaneously in TGV.Furthermore, some scholars pay attention to fractional-order gradients replacing integer-order gradients [23].Their research shows that using a fractional differential operator with 0 < v < 1 can appropriately process the noise and edge information, but it also has un-denoised "spots" in the image.
Although these improved methods can alleviate the staircase artifacts, they might lead to "spots" effects on the processed image.How to choose a good regularization functional is a key point in imaging science, in order to balance the staircase artifacts and "spots" effects.Recently, Selesnick and Chen proposed the total variation with overlapping group sparsity (OGS-TV) [24][25][26][27][28], which introduces the concept of the group gradient into the TV model and takes into full consideration the dissimilarity between smooth and edge regions.The OGS-TV model can distinguish the individual noise point and image edge point, so it greatly alleviates the staircase effect.Based on this work, Liu et al. applied this method in the removal of speckle noise.Wu and Du applied the OGS model in the field of Magnetic Resonance (MR) image reconstruction [27].In this paper, we introduce the OGS model into the denoising of impulse noises.
In the typical denoising method, the L1-norm is commonly used as the fidelity term of impulse noise.However, the solution to the L1-norm usually involves the soft-thresholding function, which reduces large values by a constant amount.As a result, noise signals are estimated systematically underestimated for large signal values [26].To improve this shortcoming, many non-convex reconstruction methods are proposed.Non-convex regularizers have also shown to exhibit a sparser solution than the L1 regularizer [24,29,30].Inspired by their research, we propose a total variation model based on overlapping group sparsity and Lp-pseudo-norm shrinkage (called OGS-Lp for short).Compared with the L1-norm, the Lp-pseudo-norm adds another degree of freedom to the model, which better characterizes the sparsity features of the image [31].
To solve the problem, the alternating multiplier iterative method (ADMM) [32] and the majorization-minimization (MM) algorithm [33] were used to split the complex problem into several subproblems.Furthermore, an accelerated ADMM with a restart [34] is used to solve the new model (OGS-Lp-FAST for short).In this way, a large amount of spatial-domain calculations are transferred to the frequency domain, which significantly reduces the complexity of the algorithm and speeds up its convergence.
The anisotropic total variation (ATV), isotropic total variation (ITV), total generalized variation (TGV), overlapping group sparsity with L1-norm (OGS-L1), overlapping group sparsity with pseudo-norm (OGS-Lp), and our method are compared experimentally using criteria such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and runtime.The model and algorithm proposed here could further improve the image denoising performance.
The contributions of this study are as follows: (1) By introducing OGS into TV, a new regularization was proposed, which incorporated the advantages of TV and OGS models.In the OGS-TV model, the neighborhood difference of each point is fully utilized to augment the difference between the smooth and edge regions.It could balance the staircase artifacts and "spots" effects well.(2) We adopt the Lp-pseudo-norm instead of the L1-norm to describe the fidelity term of impulse noise, extending the L1-norm-based OGS-TV to the OGS-Lp model.(3) The ADMM framework was employed to solve the proposed model.In the ADMM framework, the complex multi-constraints optimization problem is changed to several decoupled subproblems, which are easier to solve.Fourier transform technology is introduced to transform the matrix operation from the spatial domain to the frequency domain, which avoids large-scale matrix calculations.(4) In order to achieve faster convergence speed, the rapid ADMM with a restart process is adopted to improve the speed of the proposed algorithm.This improved model is named OGS-Lp-FAST.The rate of convergence of the model increases from . This paper is organized as follows: Section 2 gives a review of the traditional TV model; Section 3 describes the incorporation of overlapping group sparsity and Lp-pseudo-norm shrinkage into the TV model, and uses accelerated ADMM with a restart to solve the new model; Section 4 seeks to validate the proposed algorithm with standard images, and compares it with four other models; and Section 5 summarizes this paper and proposes future work.

Traditional TV Model
An image could contain many types of noise.According to the distribution condition of probability density function (PDF) of their amplitude, noises are classified into Gaussian noise, Rayleigh noise, uniform noise, exponential noise, impulse noise, gamma noise, etc.
In this paper, the discussion focuses on impulse noise denoising of images.Impulse noise is additive and is mainly caused by black-and-white bright and dark spots produced by image sensors, transmission channels, decoding processes, etc.In 2004, Nikolova [11] proposed the use of a L1 data-fidelity term for impulse noise related problems.Since then, many research papers adopted this model in their characterization of this type of noise [35][36][37].The ATV model of impulse noise based on this model is: where G ∈ R M×M is the image with noise and F ∈ R M×M is the denoised image.With 1 being the L1-norm, the first term in Equation (1), i.e., argmin F F − G 1 is called the fidelity term, and the second term µR ATV (F) is the sparsity regularization term, which includes the prior sparsity information of the image.µ is a regularization parameter for weighing between the fidelity and regularization terms.The image restoration problem can then be solved by finding the smallest value of F satisfying the conditions, so that Equation (1) is valid.Since the regularization term of the anisotropic total variation model needs to ensure the minimization of both horizontal and vertical gradients, R ATV (F) can be defined as where * represents the convolution, and are the differential operators used for convolution operations in horizontal and vertical directions, respectively.

Overlapping Group Sparsity with L1 Norm (OGS-L1) Model
To reduce the staircase effect of the ATV model, Selesnick and Chen proposed the overlapping group sparsity regularization term [5][6][7] in 2006, which expands the vertical and horizontal gradient of pixels to the group gradient of N adjacent points (N is the size of the group).Setting a reasonable threshold, the individual noise points and image edge points can be distinguished.This model preserves the edge information of the image and mitigates the disadvantages of the staircase effect.With reference to the work of Selesnick and Chen, Liu et al. extended the overlapping group sparsity regularizer from one-dimensional to two-dimensional cases, then substituted it in the anisotropic total variation model for the deconvolution and removal of salt and pepper noise [8].This model is centered at the pixel x i,j and extends into all directions, forming multiple staggered and overlapped squares.The variable ∼ X i,j,N,N ∈ R N×N is the N × N pixel matrix, centered at the coordinates (i, j), as shown in Equation (3): where and is the rounding-down operators.We define ϕ(X) to represent the overlapping group sparsity functional of the two-dimensional array: The ATV model can then be extended to embody overlapping group sparsity regularization (called OGS-L1 model for short), as shown in Equation ( 5): where the regularization term ϕ(X) . is the group gradient.Equation (4) shows that the OGS-TV model takes into full consideration the gradient information close to a pixel, so it strengthens the dissimilarity between the smooth and edge regions of the image.

Overlapping Group Sparsity with Lp-Pseudo-Norm (OSG-Lp) Model
The L1-norm is commonly used as the fidelity term of impulse noise.However, the L1-norm is only the convex relaxation of the L0-norm.The p th power of the Lp-norm (0 ≤ p ≤ 1, for simplicity, we name it Lp-pseudo-norm) is another relaxation of the L0-norm.In fact, the L1-norm constraint is a particular case of the Lp-pseudo-norm.
The Lp-pseudo-norm makes an improvement on the sparsity-based shrinkage operator by introducing another degree of freedom, thus giving the model a better ability to depict the sparsity of an image in the gradient domain, as shown in Figure 1.
Lp-pseudo-norm contour lines are given in Figure 1, where p = 2 and p = 1 represent L2and L1-norms, respectively.Assuming that the image is contaminated by impulse noises with an absolute difference of τ, Figure 2 shows the schematic plots of anisotropic total variation contour lines R ApTV intersecting with the fidelity term.As shown in Figure 2, the intersections of the contour lines with the fidelity term are more sparse for 0 < p < 1, as shown in Figure 2b, than for p = 1, as shown in Figure 2a, therefore the robustness of the model against noise is better.
Based on the above analysis, the advantages of Lp-quasinorm regularization are listed as follows: (1) The LpS operator may converge to an accurate solution.(2) The Lp-quasinorm is more flexible than the L1-norm.This might be useful to adapt the degree of sparsity to the signal being processed.
(3) The Lp-quasinorm feasible domain makes the solution robust to noise.Thus, the L1-norm-based OGS-TV could be extended to the Lp-pseudo-norm (abbreviated as OGS-Lp) [38,39,44] and is expressed as follows:  Thus, the L1-norm-based OGS-TV could be extended to the Lp-pseudo-norm (abbreviated as OGS-Lp) [38,39,44] and is expressed as follows: Thus, the L1-norm-based OGS-TV could be extended to the Lp-pseudo-norm (abbreviated as OGS-Lp) [38,39,44] and is expressed as follows:

Solving the OSG-Lp Model
The OSG-Lp model is treated as a minimization problem whose computation is given below.In real-life images, the values of all pixels usually fall in a limited interval [a, b].For calculation and verification convenience, the image data are normalized so that they are all within [0, 1].The operator P Ω is first defined with To solve the proposed model, we employed the ADMM framework and changed the complex problem into several subproblems.First, some intermediate variables were introduced to decouple the subproblems.That is, 6) can be reformulated into the following constrained optimization problem: According to the principle of the ADMM framework, the Lagrange multipliers and the quadratic penalty term are needed to establish the augmented function.Then we have the following: where Λ 1 , Λ 2 , Λ 3 , Λ 4 are the Lagrange multipliers, and β 1 , β 2 , β 3 , β 4 > 0 are the penalty coefficients.A, B is the inner-production operator of the matrixes A and B.
In Equation (10), the expressions satisfy the form of a 2 − 2ab + b 2 = (a − b) 2 , so Equation ( 10) can be written as Equation (12): Because Z 1 , Z 2 , Z 3 , Z 4 , Z 1 , Z 2 , Z 3 , Z 4 are mutually independent, so, they can be solved as some independent subproblems according to the principle of the ADMM algorithm.These problems can be solved by the iterative algorithm for minimizing Z i .
is generated as follows: 2 .( 14) Each of these four equations are solved below: (1) The Z (k+1) 1 and the Z (k+1) 2 are solved by the majorization-minimization (MM) algorithm [33], which approximates the solution of the target problem by finding a well-behaving multi-variable auxiliary function and constructing an iterative sequence.
First, suppose a minimization optimization problem with the following form: where α > 0 and ϕ (v To avoid having to solve the complex minimization problem P(v) directly, a function Q(v, u) for which Q(v, u) ≥ P(v) for all v, u could be constructed.The equality sign holds if and only if u = v.Thus, the optimal solution of P(v) is the minimum value of Q(v, u).Generally, an MM iterative algorithm for minimizing P(v) has the form: It can be solved step-by-step in the following way.
The properties of the function It is known that the equality sign holds when u = v, as shown in Equation ( 19): which means ϕ(v) can be solved by constructing the function S(v) of Equation ( 20): After a simple calculation [31], S(v, u) is rewritten as to facilitate future calculations.C(u) in the above equation is independent of v, and D(u) ∈ R M 2 ×M 2 is a diagonal matrix with its elements defined as The entries of D can be easily computed by using MATLAB built-in function "conv2".Putting Equations ( 17), (21), and ( 22) together, the optimization problem P(v) can be written as When v = u, Q(u, u) = P(u), to minimize P(v), the MM aims to iteratively solve with the solution where I ∈ R M 2 ×M 2 is an identity matrix with the same size of D 2 v (n) , D 2 v (n) is also a diagonal matrix which has the same form of Equation (22).Observing functions of Equations ( 13) and ( 14), the subproblem Z 1 , Z 2 conforms to the function in Equation ( 17) and can be solved iteratively using Equation (26).Z i(n+1) (i = 1, 2) represents the (n)-th iteration of the MM algorithm in the (k + 1)-th outer loop.
where mat plays the role of reshaping a vector to a matrix and z Appl.Sci.2018, 8, 2317 9 of 21 (2) Z (k+1) 3 could be solved by the famous soft thresholding shrinkage method [45] as shown below: (3) The Z (k+1) 4 subproblem is computed as (4) The F (k+1) subproblem is solved by substituting , Z , Z into Equation ( 12) and calculating for the variable F (k+1) .Under the assumption of periodic boundary conditions, fast Fourier transform is applied to both sides of the equation to perform the computation in the frequency domain instead of a spatial domain, in order to reduce the computational complexity caused by matrix multiplication.Matrix multiplication is converted to dot product operation, in other words, solving the following normal equation: where x is the frequency-domain representation of x, ". * " stands for dot product, "*" is the conjugate, 1 is the matrix whose entries are all 1, and F represents the two-dimensional fast Fourier transform (FFT).Rearranging of the equation gives F (k+1) as (5) The dual variables Z 1 , Z 2 , Z 3 , Z 4 could be updated via the gradient ascent method. )

OGS-Lp-FAST Model
The denoising models based on OGS is more time consuming than the TV-based model.This is mainly because that OGS model considers the gradient information of the neighborhood in a reconstructed image, thus making the computation more complex.Thus, this is a shortcoming.
Goldstein et al. [12] proposed an accelerated ADMM algorithm with a restart that improves the convergence rate of the ADMM algorithm from O( 1 k ) to O( 1 k 2 ).Inspired by them, we adopted this algorithm to improve the OGS-Lp model.This modified model is named OGS-Lp-FAST ("Ours" for short).The auxiliary variables U i (i = 1, 2) and U i (i = 1, 2) are first adopted.Under this framework, Z i (i = 1, 2, 3, 4) is updated in the following way: The initial values of Z (k+1) 2(0) in the above equation are: The dual variable Z i (i = 1, 2) can be updated as follows: As image denoising is not a strong convex problem, the iteration needs to be restarted to ensure the convergence of the algorithm.When Equation ( 33) is not satisfied, the algorithm is restarted.
where c 2 and the dual residuals , and η is a number close to 1.To prevent frequent restarts, η = 0.97 is set.When c , the acceleration step size is set to ε i , and the auxiliary variables U i (i = 1, 2) and U (k+1) i (i = 1, 2) are updated as follows: which is updated according to the equations below upon restart: Up to this point, all subproblems of the proposed model are solved.The OGS-Lp-FAST algorithm just described is summarized as Algorithm 1.
The F sub-problem should be updated as Algorithm 1 OGS-Lp-FAST pseudo-code Input: image G with noise Output: denoised image F Initialize: 6: F (k+1) are updated with Equation (39)

Experimental Results and Analyses
In this section, eight typical grayscale images with a size of 256 × 256 pixels are chosen to validate the denoising performance of the Lp-OSG-TV-FAST method.The test images are as shown in Figure 3.The image "House" is downloaded from http://sipi.usc.edu/database/database.php?volume%92=%92misc&image=5top.The images "Lena" and "Pepper" are from http://decsai.ugr.es/cvg/dbimagenes/.The images "Woman", "Girl", and "Reagan" are from http://www.hlevkin.com/default.html#testimages.The images "Milk drop" and "Shoulder" are from http://www.cs.cmu.edu/~cil/v-images.html.The versions of the images in our paper are other special formats which are converted by Photoshop from the sources above.
The method proposed here is compared with ATV, ITV, TGV, OSG-L1, and OGS-Lp methods, and is evaluated objectively in terms of the peak signal-to-noise ratio (PSNR), structural similarity (SSIM), runtime, and other experimental indicators [45].Simulations are performed on MATLAB R2014a platform running in a hardware environment of Inter(R) Core (TM) i7-6700@3.4GHzCPU and 16 GB memory.
"Shoulder" are from http://www.cs.cmu.edu/~cil/v-images.html.The versions of the images in our paper are other special formats which are converted by Photoshop from the sources above.
The method proposed here is compared with ATV, ITV, TGV, OSG-L1, and OGS-Lp methods, and is evaluated objectively in terms of the peak signal-to-noise ratio (PSNR), structural similarity (SSIM), runtime, and other experimental indicators [45].Simulations are performed on MATLAB R2014a platform running in a hardware environment of Inter(R) Core (TM) i7-6700@3.4GHzCPU and 16 GB memory.(e) (f) (g) (h)

Evaluation Method
In the denoising field, the common evaluating criteria including PSNR, SSIM, and runtime.The PSNR and SSIM [46] are defined in Equation ( 39) and ( 40): X Y (39) where X denotes the original image, Y is the reconstructed image, and ( ) MAX X represents the largest gray value in the original image.
where u X is the mean of X ; u Y is the mean of Y ; 2 σ X is the variance of X ; 2 σ Y is the variance of Y ; σ XY is the covariance between X and Y ; and 512 L = The parameter 255 L = .

Sensitivity of the Parameters
In this section, an important parameter of the proposed algorithm, the group size K, is tested and compared to evaluate its overall effect on the algorithm.PSNR and SSIM are used as criteria to evaluate the algorithm objectively.Three images ("Girl", "House", and "Lena") with 30% noise level are selected, on which K is made to vary continuously from 1 to 10.The other parameters are adjusted to the optimum.PSNR and SSIM values are recorded and plotted into graphs, as shown in Figures 4 and 5.In Figure 4, PSNR and SSIM increase with the increase of K and reach their maximum at K = 5.Further increases in K leads to decreased PSNR values.Thus, the neighborhood information of an image has a positive impact on the performance of the algorithm.With K set to appropriate values, the edge information of the image is better preserved, and the noise resistance is improved.However, K should not be too large either, or nearby regions with drastic pixel changes could be taken, which will result in decreased PSNR and SSIM.
Then, we tested how to select a good regularization parameter μ for different images.We started with the value of μ from a low value then increased the parameters empirically as the level of noise improved to get the best visual effect.For example, for the "Girl" image corrupted by impulse noise from 20% to 50%, μ = 0.14, 0.15, 0.15, 0.18, respectively.

Evaluation Method
In the denoising field, the common evaluating criteria including PSNR, SSIM, and runtime.The PSNR and SSIM [46] are defined in Equations ( 40) and ( 41): where X denotes the original image, Y is the reconstructed image, and MAX(X) represents the largest gray value in the original image.
where u X is the mean of X; u Y is the mean of Y; σ 2 X is the variance of X; σ 2 Y is the variance of Y; σ XY is the covariance between X and Y; and L = 512, k 1 = 0.05, k 2 = 0.05.The parameter L = 255.

Sensitivity of the Parameters
In this section, an important parameter of the proposed algorithm, the group size K, is tested and compared to evaluate its overall effect on the algorithm.PSNR and SSIM are used as criteria to evaluate the algorithm objectively.Three images ("Girl", "House", and "Lena") with 30% noise level are selected, on which K is made to vary continuously from 1 to 10.The other parameters are adjusted to the optimum.PSNR and SSIM values are recorded and plotted into graphs, as shown in Figures 4 and 5.In Figure 4, PSNR and SSIM increase with the increase of K and reach their maximum at K = 5.Further increases in K leads to decreased PSNR values.Thus, the neighborhood information of an image has a positive impact on the performance of the algorithm.With K set to appropriate values, the edge information of the image is better preserved, and the noise resistance is improved.However, K should not be too large either, or nearby regions with drastic pixel changes could be taken, which will result in decreased PSNR and SSIM.
In the selection of the parameter p, the p value is set between 0 and 1.On the premise of fixing other parameters, we increase p by 0.1 step.After several rounds of experiments, we select the optimal when the image gets the best visual effect.
The optimal parameters for different images with the noise level from 20% to 50% are given in Table 1.Six images are selected from the original images of Figure 3 for testing, on which impulse noise at levels from 20% to 50%, to compare the denoising effects of ATV, ITV, TGV, OGS-L1, OGS-Lp, and Ours algorithms (six in total).To ensure the objectiveness and fairness of the evaluation, the above algorithms all adopt the following iterative condition: Regularization parameters of these algorithms are adjusted to ensure the best denoising effect of each, which ensured the fairness of the test.For methods based on the OGS model, the group size is set to K = 5.The test results on different images are given in Tables 2-5.The best indicator values are labeled as black and bold.By observing the data in each table, the following conclusions could be made: 1.
With the introduction of different levels of noise to the images, our model generates higher PSNR and SSIM values for the reconstructed images than other methods, indicating its superior denoising effect.The recovered images also resemble the original ones more.2.
The proposed model works better at lower noise levels.For example, at a 20% noise level, as shown in Table 2, the PSNR value of the "House" image (37.72 dB) given by our model is 5.91 dB higher than that given by the ITV model (31.81 dB) and 5.4 dB higher than that of the TGV model (32.32 dB).Even at high noise levels, our model still performs better than the others, which shows the clear advantages that total variation with overlapping group sparsity has over the classic anisotropic TV model.

3.
Compared to OGS-L1, our proposed method incorporates the Lp-pseudo-norm shrinkage, which adds another degree of freedom to the algorithm and improves the depiction of the gradient-domain sparsity of the images, achieving a better denoising effect.For example, at a 20% noise level, as shown in Table 2, the PSNR value of the "Girl" image (32.34 dB) given by our model is 1.67 dB higher than that given by the OGS-L1 model (30.67 dB).Even at a noise level of 50%, as shown in Table 5, the PSNR value of the "Girl" image (27.35 dB) given by our model is still 0.90 dB higher than that given by the OGS-L1 model (26.45 dB).This proves that the Lp-pseudo-norm is more suitable as a regularizer for describing the sparsity of images than the L1-norm.4.
In terms of the runtime of the six models, the OGS-based method is more time consuming than ATV, ITV, and TGV.This is mainly because the OGS model considers the gradient information of the neighborhood in an image undergoing reconstruction, thus making the computation more complex.

5.
Comparing the values of PSNR and SSIM in Tables 2-5, OGS-Lp-FAST and OGS-Lp have the same denoising effect.However, by observing the value of runtime of all testing images, we find that convergence is sped up in the OGS-LP-FAST method with the use of accelerated ADMM with a restart.For example, at a 20% noise level, as shown in Table 2, the time value of the "Woman" image (8.69 s) given by the OGS-Lp-FAST model is 7.53 s less than that given by the OGS-L1 model (16.22 s).For the denoising results of TGV, the blocking artifacts in the images are sufficiently suppressed, but local heavy noise spots are still observable.The test images ("Woman", "Pepper", "Girl", and "House") are corrupted by 20~50% impulse noise, respectively.

Discussion and Conclusions
In this work, we study a new regularization model by applying TV with OGS and Lp-pseudo-norm shrinkage for the image polluting under impulse noise.We provided the efficient algorithm OGS-Lp-FAST under the ADMM framework.This algorithm is rooted in overlapping group sparsity-based regularization, and incorporated the comparisons made with ATV, ITV, TGV, OGS-L1, and OGS-Lp models for validation of our proposed method.The following conclusions are drawn from the experimental results: 1.
An overlapping group sparsity (OGS)-based regularizer is used to replace the anisotropic total variation (ATV), to describe the prior conditions of the image.OGS makes full use of the similarity among image neighborhoods and the dissimilarity in the surroundings of each point.It promotes the distinction between the smooth and edge regions of an image, thus enhancing the robustness of the proposed model.

2.
Lp-pseudo-norm shrinkage is used in place of the L1-norm regularization to describe the fidelity term of images with salt and pepper noise.With the inclusion of another degree of freedom, Lp-pseudo-norm shrinkage reflects the sparsity of the image better and greatly improves the denoising performance of the algorithm.

3.
The difference operator is used for convolution.Under the ADMM framework, the complex model is transformed into a series of simpler mathematical problems to solve.

4.
Appropriate K values could effectively improve the overall denoising performance of the model.In practice, this parameter needs to be adjusted.If it is too small, the neighborhood information is not utilized completely.If the value is too big, too many dissimilar pixel blocks will be included, impairing the denoising result.

5.
The adoption of accelerated ADMM with a restart accelerates the convergence of the algorithm.The running time is reduced.6.
In this paper, we focus on impulse noise removal, but the model is also applicable to other types of noise removal that we will further study in future work.

Table 1 .
The optimal parameters for different images with the noise level from 20% to 50%.

Table 2 .
Numerical comparison of our proposed method and other models (images are corrupted by impulse noise of 20%).ATV: anisotropic total variation; ITV: isotropic total variation; TGV: total generalized variation; OGS-L1: overlapping group sparsity with L1-norm; OGS-Lp: overlapping group sparsity with pseudo-norm.

Table 3 .
Numerical comparison of our proposed method and other models (images are corrupted by impulse noise of 30%).

Table 4 .
Numerical comparison of our proposed method and other models (images are corrupted by impulse noise of 40%).