Sparse Recovery Algorithm for Compressed Sensing Using Smoothed l 0 Norm and Randomized Coordinate Descent

: Compressed sensing theory is widely used in the ﬁeld of fault signal diagnosis and image processing. Sparse recovery is one of the core concepts of this theory. In this paper, we proposed a sparse recovery algorithm using a smoothed l 0 norm and a randomized coordinate descent (RCD), then applied it to sparse signal recovery and image denoising. We adopted a new strategy to express the ( P 0 ) problem approximately and put forward a sparse recovery algorithm using RCD. In the computer simulation experiments, we compared the performance of this algorithm to other typical methods. The results show that our algorithm possesses higher precision in sparse signal recovery. Moreover, it achieves higher signal to noise ratio (SNR) and faster convergence speed in image denoising.


Introduction
In recent years, compressed sensing (CS) has become an essential mathematical tool in the field of information theory [1,2]. It has been widely used in the field of fault diagnosis [3], image processing [4], data compression [5,6], and so on. CS theory suggests that a signal can be recovered from fewer samples than suggested by the Shannon-Nyquist sampling rate given that the original signal is sparse or approximately sparse in certain representation domains.
As shown in Figure 1, assuming that signal f (n × 1) will be recovered from incomplete sampling values, the sampled value of the signal f is as follows: where Φ (m × n, m << n) is the measurement matrix, and y (m × 1) is an measurement vector. When the dimension of y is much lower than that of f , Equation (1) becomes an underdetermined equation and f is unable to be calculated. However, if the number of non-zero elements in f is K (K ≤ m), f can be calculated, in which case K is called sparsity. Most natural signals are not sparse in practical applications. The sparse representation of f is obtained by a specific mathematical transformation Ψ, as shown in Equation (2) and Figure 2. where Ψ is the sparse dictionary, and s is the sparse representation coefficient, such that signal f can then be recovered indirectly by constructing the sparse coefficients, which is shown in Equation (3) and Figure 3. y = ΦΨs (3) Mathematics 2019, 7, x FOR PEER REVIEW 2 of 13 where Ψ is the sparse dictionary, and s is the sparse representation coefficient, such that signal f can then be recovered indirectly by constructing the sparse coefficients, which is shown in Equation (3)   where Ψ is the sparse dictionary, and s is the sparse representation coefficient, such that signal f can then be recovered indirectly by constructing the sparse coefficients, which is shown in Equation (3)   where Ψ is the sparse dictionary, and s is the sparse representation coefficient, such that signal f can then be recovered indirectly by constructing the sparse coefficients, which is shown in Equation (3)   Here we set A = ΦΨ, then Equation (3) changes to y = As. If A satisfies restricted isometry property (RIP), s can be accurately recovered [1]. The CS theory consists of two parts. The first is the sparse representation. Because a natural signal is not sparse in a usual case, we need to first make it sparse by mathematical transformation. The second is sparse recovery, which means to recover the sparse signal. Among the two parts, the sparse recovery method directly determines the quality of signal recovery [7]. In this paper, we focus on proposing a sparse recovery algorithm for CS. In the following sections, signal s is set to be a sparse signal vector.
Generally speaking, the purpose of sparse recovery is to solve the l 0 norm minimization problem, as shown in Equation (4): min s 0 s.t. y = As (4) Equation (4) aims at seeking the minimal s 0 solution of y = As, which is also called the (P 0 ) problem. However, if m << n, the (P 0 ) problem becomes the NP-hard, which means that we cannot solve Equation (4) directly in that condition [8][9][10].
In addition, we considered the following optimization problem: According to the literature [10], there are notably different properties of l p norms for different values of p. As is shown in Figure 4, when p > 1, Equation (5) is a convex optimization problem and its solution is not sparse (the elements of the intersection point are all not 0). While when 0 ≤ p < 1, it becomes a non-convex optimization problem and its solution is sparse (only one element of the intersection point is 0). In particular, when p = 1, Equation (5) is called a basis pursuit (BP) algorithm and under certain conditions its solution is also sparse, but may not result in the sparsest solution compared to the case when 0 ≤ p < 1 [9]. Here we set A = ΦΨ, then Equation (3) changes to y = As. If A satisfies restricted isometry property (RIP), s can be accurately recovered [1].
The CS theory consists of two parts. The first is the sparse representation. Because a natural signal is not sparse in a usual case, we need to first make it sparse by mathematical transformation. The second is sparse recovery, which means to recover the sparse signal. Among the two parts, the sparse recovery method directly determines the quality of signal recovery [7]. In this paper, we focus on proposing a sparse recovery algorithm for CS. In the following sections, signal s is set to be a sparse signal vector.
Generally speaking, the purpose of sparse recovery is to solve the l0 norm minimization problem, as shown in Equation (4): Equation (4) aims at seeking the minimal || || solution of = , which is also called the (P0) problem. However, if m << n, the (P 0 ) problem becomes the NP-hard, which means that we cannot solve Equation (4) directly in that condition [8][9][10].
In addition, we considered the following optimization problem: According to the literature [10], there are notably different properties of lp norms for different values of p. As is shown in Figure 4, when p > 1, Equation (5) is a convex optimization problem and its solution is not sparse (the elements of the intersection point are all not 0). While when 0 ≤ p < 1, it becomes a non-convex optimization problem and its solution is sparse (only one element of the intersection point is 0). In particular, when p = 1, Equation (5) is called a basis pursuit (BP) algorithm and under certain conditions its solution is also sparse, but may not result in the sparsest solution compared to the case when 0 ≤ p < 1 [9]. In a term, if 0 ≤ p < 1, the solution of Equation (5) is more sparse. Unfortunately, this results in a non-convex optimization problem. In order to resolve this problem, researchers adopt the sum of a set of smooth concave functions to express the (P0) problem approximately, which is called the smooth l0 norm approximation algorithm.
For instance, literature [17] adopted an arctangent function as where si is the elements in vector s. Then the (P0) problem can be approximately expressed as In a term, if 0 ≤ p < 1, the solution of Equation (5) is more sparse. Unfortunately, this results in a non-convex optimization problem. In order to resolve this problem, researchers adopt the sum of a set of smooth concave functions to express the (P 0 ) problem approximately, which is called the smooth l 0 norm approximation algorithm.
For instance, literature [17] adopted an arctangent function as where s i is the elements in vector s. Then the (P 0 ) problem can be approximately expressed as Similarly, some research [18] adopted a hyperbolic tangent function as The recovery algorithm based on a smoothed l 0 norm and modified Newton method (L0MN) in the literature [19] adopted a function as The recovery algorithm based on a smoothed l 0 norm and gradient projection (L0GP) in the literature [20] adopted a function as The smooth l 0 norm approximation algorithms can be easily solved by Lagrange multiplier technique. However, because f (s i ) is non-convex, these algorithms have the possibility of falling into a local optimum.
In this paper, we have improved Equation (9) to establish a new strategy to express the (P 0 ) problem approximately. However, the resulting optimization problem is still not a convex problem, hence finding a global optimal solution is not a trivial. For this, we selected a randomized coordinate descent (RCD) algorithm for solving the optimization, which does not guarantee one to find the global optimum, but it can avoid many local minima, and probably can land close to the global optimum.

Cost Function
Observing Equation (9), we find that it is a function with very simple structure. According to the literature [19], this equation can reduce the computational complexity while ensuring precision. However, as is shown in Figure 5, in very few cases, s i + σ = 0 will occur. This case can lead to the discontinuity of functions, which will affect the calculation results to a certain extent.
In order to avoid this defect, we give a continuous smooth fractional function as Equation (11), where σ > 0. In this way, we guarantee the continuity of this function, which is shown in Figure 6. As shown in Figure 7, when the value of σ decreases gradually, f σ (s i ) gradually approaches the form of l 0 norm, that is: Mathematics 2019, 7, 834 5 of 13

Cost Function
Observing Equation (9), we find that it is a function with very simple structure. According to the literature [19], this equation can reduce the computational complexity while ensuring precision. However, as is shown in Figure 5, in very few cases, + = 0 will occur. This case can lead to the discontinuity of functions, which will affect the calculation results to a certain extent.  In order to avoid this defect, we give a continuous smooth fractional function as Equation (11), where > 0. In this way, we guarantee the continuity of this function, which is shown in Figure 6.  As shown in Figure 7, when the value of σ decreases gradually, ( ) gradually approaches the form of l0 norm, that is: It should be noted that when → 0 , although ( ) and l0 norm look similar, they are essentially different. In order to use Equation (12) to express the (P0) problem, we make the following attempts: In order to avoid this defect, we give a continuous smooth fractional function as Equation (11), where > 0. In this way, we guarantee the continuity of this function, which is shown in Figure 6.  As shown in Figure 7, when the value of σ decreases gradually, ( ) gradually approaches the form of l0 norm, that is: It should be noted that when → 0 , although ( ) and l0 norm look similar, they are essentially different. In order to use Equation (12) to express the (P0) problem, we make the following attempts: It should be noted that when σ → 0 , although f σ (s i ) and l 0 norm look similar, they are essentially different. In order to use Equation (12) to express the (P 0 ) problem, we make the following attempts: Mathematics 2019, 7, 834 6 of 13 As is shown in Equation (11), when σ → 0 , the value f σ (s i ) can only be 0 or −1. If we want to minimize n i=1 f σ (s i ), we need to make as many terms as possible in this polynomial equal to −1. Then, according to Equation (11), this is equivalent to that which we need to make as many elements as possible in vector s equal to 0. Thus, this is consistent with the meaning of l 0 norm minimization.
In a word, when σ → 0 , the optimal solution of Equations (4) and (13) are identical. Therefore, we can adopt Equation (13) to express the (P 0 ) problem approximately.
Equation (13) is a typical optimization problem with an equality constraint. The Lagrange multiplier method can transform such problems into unconstrained optimization problems. The cost function can be constructed as follows, where λ is the regularization parameter.

Solution
Generally, the gradient descent method (or the idea of gradient descent) can be used to solve Equation (14). The procedures of gradient descent method are as follows: The gradient ∇L(s) of L(s) at point s is: If we want to minimize L(s), we need to update s to the direction of negative gradient: where h is the step size of the iterative process. When the initial value and termination condition are given, the optimal solution of Equation (14) can be obtained by iteration, and that is the standard gradient descent method. However, the gradient descent method is not intended for non-convex problems, because it is easy to be trapped in local minima. Therefore, we need to adopt strategies to get it out of local minima. The common methods for this include the simulated annealing (SA) algorithm [21], the genetic algorithm (GA) [22], the randomized coordinate descent (RCD) [23] method, and so on. RCD is a special case of stochastic gradient descent (SGD) [24], which is widely used to optimize non-convex problems because of its simple principle and fast convergence speed.
Unlike the traditional gradient descent method, RCD adds random factors to the calculation of gradient. Therefore, even if it falls into the local minima, its gradient may not be zero, so it may jump out of the local minima and continue to iterate [23].
In each iteration process, RCD method does not derive partial derivatives of all variables, but derives partial derivatives of one (or several) variables. The detailed measures are as follows: Let s rand be a random element in vector s, and then we can calculate the gradient, as in Equation (17). where a rand denotes the column corresponding to s rand in matrix A. Then Equation (16) can be updated as: Now, we discuss how to choose the step size h. In the process of iteration, the value of the function is required to decrease, which is shown as Equation (19).
That is to say, when the direction of iteration is determined, we need to minimize L s (k+1) . This problem can be reduced to Equation (20). where Here we adopt Goldstein method to solve this problem, which can be referred to in the literature [25]. Because our method is to solve the smooth l 0 norm based sparse recovery problem using RCD, we name it L0RCD.

Algorithm Description
Now, we summarize the sparse recovery algorithm in this paper, as in Table 1. Table 1. The description of the smooth l 0 norm based sparse recovery algorithm using RCD (L0RCD).
Step 4: Take the updated s and σ as initial value, and repeat step 1 to step 3, stop iterating until σ < σ min and output to s * .
In Table 1, σ 0 and s 0 are the initial values of σ and s, respectively. A T (AA T ) −1 y is the least square solution of y = As, and λ can be estimated by generalized cross-validation (GCV) method [26]. According to literature [17][18][19][20], in order to ensure that the search scope is more and more refined in the iteration process, we take a set of descending sequences of σ, where σ (k+1) = 0.7σ (k) .

Computational Cost Analysis
The main computational complexity of L0RCD lies in the product of the vectors, as well as the calculation of partial derivatives in the iterative process. The computational burden of a rand T (As − y) is O(m 2 ), while the burden of A T (As − y) is O(n 2 m). In addition, in the iteration process, our method only derives partial derivatives of one variable, while the general gradient descent methods derive partial derivatives of all variables. This means that, in a single iteration, the computational complexity of our method is much less than that of the general gradient descent method. Although adopting RCD will result in more iterations, its computational burden may still be lower [23].

Comparative Experiment
In this section, we will test the performance of the sparse recovery algorithms in the field of signal recovery and image denoising. All of the experiments are implemented in MATLAB R2012a on a personal computer with a 2.59 GHz Intel Pentium dual-core processor and 2 GB memory running the Microsoft Windows XP operating system.

Signal Recovery Experiment
In this section, for matrix A, we adopt a Gaussian random matrix with variance of 1, where the sizes of A are m × n, and m < n. The size of the sparse vector s is n × 1, and the 0 elements distribute randomly in s. The values of non-zero elements in vector s are also random. Vector y can be calculated by y = As. Under the condition that y and A are known, we reconstruct s using the sparse recovery algorithm; the recovery result is denoted as s * .
Here we use precision to evaluate the performance of the algorithms. The precision is defined as follows:

Comparison of RCD and the Modified Newton Method
In order to highlight the advantages of RCD, we compared it with the modified Newton (MN) method from the literature [19]. In addition, to give a fair comparison, we constructed the same l 0 norm approximation by Equation (9). That is, we adopted the same cost function but different optimization methods in the signal recovery experiments of this section.
In the first experiment, m = 60 and n = 100 were fixed, and K varied between 5 and 40. The precisions are given in Figure 8. Here we use precision to evaluate the performance of the algorithms. The precision is defined as follows:

Comparison of RCD and the Modified Newton Method
In order to highlight the advantages of RCD, we compared it with the modified Newton (MN) method from the literature [19]. In addition, to give a fair comparison, we constructed the same l0 norm approximation by Equation (9). That is, we adopted the same cost function but different optimization methods in the signal recovery experiments of this section.
In the first experiment, m = 60 and n = 100 were fixed, and K varied between 5 and 40. The precisions are given in Figure 8. From Figure 8, we see that when K ≤ 20, both RCD and MN can be used to recover s precisely. However, with the increase of K, RCD performs relatively better than MN. This shows that RCD is more appropriate under these conditions.
In the second experiment, K = 20 and n = 100 were fixed, and m varied from between 30 and 90. The precisions are given in Figure 9. From Figure 8, we see that when K ≤ 20, both RCD and MN can be used to recover s precisely. However, with the increase of K, RCD performs relatively better than MN. This shows that RCD is more appropriate under these conditions.
In the second experiment, K = 20 and n = 100 were fixed, and m varied from between 30 and 90. The precisions are given in Figure 9.
From Figure 8, we see that when K ≤ 20, both RCD and MN can be used to recover s precisely. However, with the increase of K, RCD performs relatively better than MN. This shows that RCD is more appropriate under these conditions.
In the second experiment, K = 20 and n = 100 were fixed, and m varied from between 30 and 90. The precisions are given in Figure 9.  From Figure 9, we can see that when m ≥ 55, both the two methods perform well. However, with the decrease of m, RCD performs relatively better than MN.

Comparison of L0RCD and the Other Sparse Recovery Algorithms
In this section, we will compare the L0RCD with OMP, MP, basis pursuit(BP), and L0GP. Similarly to the first experiment, m = 60 and n = 100 were fixed, and K varied between 5 and 40. The precision of the five methods are given in Figure 10. From Figure 10, we see that with the increase of K, L0RCD performs better than the other four algorithms in precision.
Mathematics 2019, 7, x FOR PEER REVIEW 9 of 13 From Figure 9, we can see that when m ≥ 55, both the two methods perform well. However, with the decrease of m, RCD performs relatively better than MN.

Comparison of L0RCD and the Other Sparse Recovery Algorithms
In this section, we will compare the L0RCD with OMP, MP, basis pursuit(BP) , and L0GP. Similarly to the first experiment, m = 60 and n = 100 were fixed, and K varied between 5 and 40. The precision of the five methods are given in Figure 10. From Figure 10, we see that with the increase of K, L0RCD performs better than the other four algorithms in precision. In the second experiment, K = 20 and n = 100 are fixed, and m varies from between 30 and 90. The precision of the five methods are given in Figure 11. From Figure 11, we see that with the decrease of m, L0RCD performs better than the other four algorithms in precision. In the second experiment, K = 20 and n = 100 are fixed, and m varies from between 30 and 90. The precision of the five methods are given in Figure 11. From Figure 11, we see that with the decrease of m, L0RCD performs better than the other four algorithms in precision.
In the rest of this subsection, we give a rough analysis for these sparse recovery algorithms. From Figures 10 and 11, we find that when K ≤ 10 or m ≥ 65, the precision of the five algorithms are all 1 (or very close to 1). For the pursuit algorithms (OMP, MP and BP), they have stricter requirements for sparsity and measurements. In the second experiment, K = 20 and n = 100 are fixed, and m varies from between 30 and 90. The precision of the five methods are given in Figure 11. From Figure 11, we see that with the decrease of m, L0RCD performs better than the other four algorithms in precision. In the rest of this subsection, we give a rough analysis for these sparse recovery algorithms. From Figures 10 and 11, we find that when K ≤ 10 or m ≥ 65, the precision of the five algorithms are all 1 (or very close to 1). For the pursuit algorithms (OMP, MP and BP), they have stricter requirements for sparsity and measurements.
Here we do not intend to claim that the approximate l0 norm algorithms are "better" than the pursuit algorithms. In fact, approximate l0 norm algorithms need to appropriate parameters, and this Here we do not intend to claim that the approximate l 0 norm algorithms are "better" than the pursuit algorithms. In fact, approximate l 0 norm algorithms need to appropriate parameters, and this often requires experience or a large number of repeated tests. In comparison, when the sparsity or measurements meet the requirements, the application of pursuit algorithms is more mature. For instance, the MOD (method of optimal directions) and K-SVD (K-singular value decomposition) used for the sparse representation of signals both employ the pursuit algorithms to implement sparse coding [10].

Image Denoising Experiment
When sparse recovery algorithm is applied to image denoising [27], Equation (4) should be changed to: min s 0 s.t. y − As 2 2 ≤ where is a parameter related to noise level. The solution method of it is the same as that of (P 0 ) problem, and y − As 2 2 ≤ can be regard as the relaxation of y = As. According to literature [10], the pursuit algorithms are all applicable to Equation (22).
For approximate l 0 norm method algorithms, the cost function for Equation (22) can be established as: In this section, we compared the image denoising performance of these methods under the same experimental conditions. We adopted signal to noise ratio (SNR) and time consumption to evaluate their performance, where SNR was defined as Equation (24).
where s * i are the elements in s * . The higher the SNR, the better the image quality is. We adopted three images named, respectively Lena, Cameraman, and Boat to test the denoising performance of the algorithms. The sizes of the three images are all 256 × 256 pixels. Gaussian noise with a mean of 0 and a standard deviation of 25 was added to the images.
Because an image is not a sparse signal, first, we used a two dimensional discrete cosine transform (2D-DCT) to represent these images sparsely. For matrix Φ, we adopted a Gauss random matrix with variance of 1, where the size of Φ was 128 × 256.
The intuitive denoising results are shown in Figures 12-14. The SNR and time consumption are shown in Table 2.
where * are the elements in s * . The higher the SNR, the better the image quality is. We adopted three images named, respectively Lena, Cameraman, and Boat to test the denoising performance of the algorithms. The sizes of the three images are all 256 × 256 pixels. Gaussian noise with a mean of 0 and a standard deviation of 25 was added to the images.
Because an image is not a sparse signal, first, we used a two dimensional discrete cosine transform (2D-DCT) to represent these images sparsely. For matrix Φ, we adopted a Gauss random matrix with variance of 1, where the size of Φ was 128 × 256.
The intuitive denoising results are shown in Figures 12-14. The SNR and time consumption are shown in Table 2.       From Figures 12-14, we see that the image denoising performance of L0RCD is better than that of other five methods. From Table 2, we see that L0RCD only achieves higher SNR, but also consumes less time. From Figures 12-14, we see that the image denoising performance of L0RCD is better than that of other five methods. From Table 2, we see that L0RCD only achieves higher SNR, but also consumes less time.

Conclusions
In this paper, a novel method based on RCD and approximate l 0 norm is proposed to solve the optimization problem of sparse recovery. For the local minima defect of approximate l 0 norm, we adopted a solution strategy with RCD. The simulations are carried out on sparse signal recovery and image denoising. Compared with the typical sparse recovery algorithms, our method performs better in precision. However, the success of the algorithm still depends on careful selection of the parameters, and how to overcome this defect is our further research goal.