Conjugate Gradient Hard Thresholding Pursuit Algorithm for Sparse Signal Recovery

We propose a new iterative greedy algorithm to reconstruct sparse signals in Compressed Sensing. The algorithm, called Conjugate Gradient Hard Thresholding Pursuit (CGHTP), is a simple combination of Hard Thresholding Pursuit (HTP) and Conjugate Gradient Iterative Hard Thresholding (CGIHT). The conjugate gradient method with a fast asymptotic convergence rate is integrated into the HTP scheme that only uses simple line search, which accelerates the convergence of the iterative process. Moreover, an adaptive step size selection strategy, which constantly shrinks the step size until a convergence criterion is met, ensures that the algorithm has a stable and fast convergence rate without choosing step size. Finally, experiments on both Gaussian-signal and real-world images demonstrate the advantages of the proposed algorithm in convergence rate and reconstruction performance.


Introduction
As a new sampling method, Compressed Sensing (CS) has received broad research interest in signal processing, image processing, biomedical engineering, electronic engineering and other fields [1][2][3][4][5][6].In particular in magnetic resonance image (MRI) processing, CS technology greatly improves the efficiency of MRI processing (Figure 1).By exploiting the sparse characteristics of signals, CS can accurately reconstruct sparse signals from significantly fewer samples than required in the Shannon-Nyquist sampling theorem [7,8].Assume that a signal f ∈ R N is s-sparse in some domain ψ.This means f = ψx, and x has at most s(s N) nonzero entries.This system can be measured by a sampling matrix Φ ∈ R M×N (M < N): y = Φ f = Φψx = Ax, where y is called measurement vector, A is the so-called measurement matrix.The CS model is shown in Figure 2. If Φ is incoherent with ψ, the coefficient vector x can be reconstructed exactly from a few measurements by solving the undetermined linear system y = Ax with constraint x 0 ≤ s, i.e., solving the following 0 norm minimization problem: min x 0 s.t.y = Ax.

Sparse Transformation
( As a combinatorial optimization problem, the above 0 norm optimization is NP-hard [8].One way to solve such problem is to transform it into a 1 norm optimization problem: min x 1 s.t.y = Ax. ( Since 1 norm-based problem is convex, some methods such as basis pursuit (BP) [7] and LASSO [9] are usually employed to solve the 1 norm minimization in polynomial time.The solutions obtained by these algorithms can well approximate the exact solution of (1).However, their computational complexity is too highly impractical for many applications.As a relaxation of the 0 norm optimization, the p norm for CS reconstruction has attracted extensive research interest, and the problem can be formulated as the following optimization problem: where 1 / p denotes the p norm of x.There are many methods have been developed to solve such problem [10][11][12][13].Many studies have shown that using p norm for 0 < p < 1 requires fewer measurements and has much better recovery performance than using 1 norm [14,15].However, p norm leads to a non-convex optimization problem which is difficult to solve efficiently.Sparse signals can be quickly recovered by these algorithms while the measurement matrix satisfying the so-called restricted isometry property (RIP) with a constant parameter.A measurement matrix is said to satisfy the s-order RIP if for any s-sparse where 0 ≤ δ s ≤ 1. Commonly used sampling matrices include Gaussian matrices, random Bernoulli matrices, partial orthogonal matrices, etc.The measurement matrices consisting of these sampling matrices and orthogonal basis usually satisfy the RIP condition with a high probability [16].
The core ideas of this paper include: (1) by combining the steps of HTP and CGIHT in each iteration, a new algorithm called Conjugate Gradient Hard Thresholding Pursuit (CGHTP) is presented; (2) by alternatively selecting a search direction from the gradient direction and the conjugate direction to improve the convergence rate of the iterative procedure; (3) furthermore, an adaptive step size selection strategy similar to that in normalized iterative hard thresholding (NIHT) is adopted in CGHTP algorithm, which eliminates the effect of step size on the convergence of HTP algorithm.
The remainder of the paper is organized as follows.Section 2 reviews related work on iterative greedy algorithms for CS.Section 3 gives the key ideas and description of the proposed algorithm.The convergence analysis of the CGHTP algorithm is given in Section 4. Some simulation experiments to verify the empirical performance of the CGHTP algorithm is carried out in Section 5. Finally, the conclusion of this paper is presented in Section 6.
Notations: in this paper, ∅ represents an empty set.a, b denotes the inner product of the vector a and b.H s (x) denotes an operator that sets all but the largest s absolute value of x. supp(x) is denoted to take the index set of nonzero entries of x.Let T, T ⊆ {1, 2, • • • , N}.Let x T denote a subvector consisting of elements extracted from x that indexed by the elements in T. A T denotes a submatrix that consists of the columns of A with indices i ∈ T. A T denotes the transpose of A. I stands for a unit matrix whose size depends on the context.In addition, let τ = M/N denote the sampling ratio.

Literature Review
In the past decade, a series of iterative greedy algorithms has been proposed to directly solve Equation (1).These algorithms have been attracting increased attention due to their low algorithm complexity and good reconstruction performance.According to the support set selection strategy used in the iteration process, we can roughly classify the family of iterative greedy algorithms into three categories.
(2) iterative hard thresholding (IHT)-based IHT-like algorithms, such as NIHT, accelerated iterative hard thresholding (AIHT) [25], conjugate gradient iterative hard thresholding (CGIHT) [26,27].Unlike the OMP-like algorithm, the approximate support set for the nth iteration of IHT-like algorithms is determined by the value of A T y + (I − A T A)x n−1 , which is closer to the correct support set than using the values of A T y [28].
(3) algorithms composed of OMP-like and IHT-like algorithms, such as Hard Thresholding Pursuit (HTP) [28,29], generalized hard thresholding pursuit (GHTP) [30], 0 regularized HTP [31], Partial hard thresholding (PHT) [32] and subspace thresholding pursuit (STP) [33].OMP-like algorithms use least squares to update sparse coefficients in each iteration, which plays a role in debiasing and makes it approach the exact solution quickly.However, it is well known that the least squares are time-consuming, which leads to a high complexity of single iteration.The advantage of IHT-like algorithms are the simple iteration operation and good support set selection strategy, but it needs more iterations.By combining the OMP-like algorithm and the IHT-like algorithm, these algorithms absorb the advantages of the above two kinds of algorithms, and have strong theoretical guarantees and good empirical performance [29].
Among the above algorithms, CoSaMP and SP do not provide a better theoretical guarantee than the IHT algorithm, but their empirical performance is better than IHT.The simple combination of CoSaMP and IHT algorithm lead to the HTP algorithm whose convergence requires fewer number of iterations than IHT algorithm.However, the reconstruction performance of HTP is greatly affected by the step size and an optimal step size is difficult to determine in practical applications, which may cause the HTP to converge slowly when selecting an inappropriate step size.Furthermore, although the HTP algorithm only requires a small number of iterations to converge to the exact vector x, it only uses a single gradient descent direction as the search direction in iterative progress, which may not be the best choice for general optimization problems.In [26], conjugate gradient method was used to accelerate the convergence of IHT algorithm due to its fast convergence rate.In this paper, inspired by the HTP algorithm and CGIHT algorithm, the operations of these two algorithms in each iteration is combined to take advantage of both.
In this paper, we first introduce the idea of the CGHTP algorithm, then analyze the convergence of the proposed algorithm, and finally validate the proposed algorithm with synthetic experiments and real-world image reconstruction experiments.

Conjugate Gradient Hard Thresholding Pursuit Algorithm
Firstly, we make a simple summary of the HTP algorithm and the CGIHT algorithm to facilitate an intuitive understanding of the proposed algorithm.The pseudo-codes of HTP and CGIHT are shown in Algorithms 1 and 2, respectively.In [20], the debiasing step in the OMP-like algorithms and the estimation step in the IHT-like algorithms are coupled in an algorithm to form the HTP algorithm.The HTP algorithm has excellent reconstruction performance as well as strong theoretical guarantee in terms of the RIP.Optimizing search direction in each iteration is the main innovation of HTP algorithm.It fully considers that in the iterative process, when the subspace determined by the support set is consistent, the conjugate gradient (CG) method [23] can be employed to accelerate the convergence rate.Two versions of CGIHT algorithms for CS are provided in [19], one of them, the algorithm called CGIHT restarted for CS is summarized in Algorithm 2. The iteration procedures of CGIHT and CGIHT restarted differ in their selection of search directions.CGIHT algorithm uses the conjugate gradient respect to A T T A as the search direction in all iteration, while CGIHT restarted determines the search direction according to whether the support set of the current iteration is equal to that of the previous iteration.
Inspired by the respective advantages of HTP and CGIHT algorithms, we present the CGHTP algorithm in this section.A simple block diagram of the CGHTP algorithm can be seen in Figure 3.The algorithm enters an iterative loop after inputting the initialized data.The loop process updates the x n and support set T n in two alternative ways, one of which is the combination of the gradient descent method and the adaptive step size selection method, and the other is the CG method.The selection criteria for these two ways is to determine whether the support sets of two adjacent iterations are equal.If the support set of the previous iteration is equal to that of the current iteration, then the gradient descent method is used.If not, the CG method is used [34].All candidate solutions and candidate support sets obtained through these ways are finally updated by the least square method.
The main characteristic of the proposed CGHTP algorithm is that the support set selection strategy of HTP is combined with the acceleration strategy of CGIHT restarted algorithm in one algorithm.The main steps of CGHTP are listed in Algorithm 3. Similar to CGIHT, CGHTP is initialized with x = 0, T −1 = supp(H s (A T y)).The main body of CGHTP mainly includes the following steps.In Step 1), the gradient direction of the cost function y − Ax n 2 2 [16] in the current iteration is solved to calculate the subsequent step size α n .In Step 2), if xn+1 has a different support from the previous estimate, i.e., T n = T n−1 , similar to the convergence criterion used in NIHT algorithm, 2 is used to check whether the step size is smaller than it, where c is a small fixed constant and 0 < c < 1.If this holds, continue to run the next steps.Otherwise the step size is continuously reduced by looping α n ← α n η until the convergence criterion is met, here 0 < η < 1.In Step 3), if T n = T n−1 , the CG direction is used as the search direction.Finally, Step 4) updates x n+1 by solving a least square problem.
Compared to the HTP algorithm, the CGHTP algorithm provides two search directions in the iterative process, including the gradient direction g n of the cost function y − Ax n 2 2 and the CG direction d n .According to whether the support sets of the previous and current iterations are equal, one of the two directions is selected as the search direction.In addition, then the support set is updated by solving a least square problem.In addition, CGHTP provides a way to choose the best step size while ensuring convergence.
The HTP algorithm uses the gradient direction as the search direction, which has a linear asymptotic convergence rate related to (κ−1) (κ + 1), where κ is the condition number of the matrix A T supp(x n ) A supp(x n ) .Owing to the superlinear asymptotic convergence speed with a rate given by ( √ κ−1) ( [24], conjugate gradient method has been applied to accelerate the convergence rate in CGIHT algorithm.CGHTP draws on the advantages of the CGIHT algorithm.If T n = T n−1 , it means that the correct support set is identified and the submatrix A T n is no longer changed, CG method is applied to solve the overdetermined system A T T n A T n x n = A T T n y, and the convergence rate may be superlinear with a rate given by ( Algorithm 1 [20] Hard Thresholding Pursuit Input: A, y, s Initialization: until the stopping criteria is met end for Algorithm 2 [19] Conjugate Gradient Iterative Hard Thresholding restarted for Compressed Sensing Input: A, y, s Initialization: until the stopping criteria is met end for

Convergence Analysis of CGHTP
In this section, we analyze the convergence of the proposed algorithms.We give a simple proof that the CGHTP algorithm converges to the exact solution of (1) after a finite number of iterations when the measurement matrix A satisfies some conditions.
In Algorithm 3, when , the iterative process of the CGHTP algorithm is equivalent to the HTP algorithm.Similar to Lemma 3.1 in [20], we obtain the following lemma: Lemma 1.The sequences defined by CGHTP eventually periodic.
Theorem 1.The sequence (x n ) generated by CGHTP algorithm converges to x in a finite number of iterations, where x denotes the exact solution of (1) The proof of Theorem 1 is partitioned here into two steps.The first step is that when the current support set is not equal to the previous support set, the orthogonal factor β is equal to 0. At this time, the CGHTP algorithm reduces to the HTP algorithm, so their convergence is the same.The second step is that when the two support sets are equal, the step size is related to the orthogonal factor and the CG direction.Detailed proof is given below.

Proof of Theorem 1. When
It is clear that xn+1 is a better approximation to x n + α n g n than x n is, so . By expanding the squares, we can obtain x n − xn+1 2 2 + 2α n x n − xn+1 , g n ≤ 0, substituting this into (5), one has A(x n − xn+1 ) 2 2 into inequality (6), one can show that Since 0 < c < 1, the right side of ( 7) is less than 0. This implies that the cost function is nonincreasing when T n = T n−1 , hence it is convergent.
In the case of T n = T n−1 , for the overdetermined linear system A T T n A T n x n = A T T n y, since A T T n A T n is a positive definite matrix, CG method for solving such problem has the convergence rate as: where It is easy to get that κ ≤ 1+δ s 1−δ s .Then (8) can be rewritten as where ρ = From Lemma 1, since the sequence (x n ) is eventually periodic, it must be eventually constant, which implies that T n+1 = T n and x n+1 = x n for n large enough.In summary, the CGHTP algorithm is convergent.

Numerical Experiments and Discussions
In this section, we verify the performance of the proposed algorithm by Gaussian-signal reconstruction experiments and real-world images reconstruction experiments.This section compares the reconstruction performance of HTP (with several different step sizes), CGIHT, and CGHTP with parameter η = 0.9.All experiments are tested in MATLAB R2014b, running on a Windows 10 machine with Intel I5-7500 CPU 3.4 GHz and 8GB RAM.

Gaussian-Signal Reconstruction
In this subsection, we construct an M × N random Gaussian matrix as the measurement matrix Φ with entries drawn independently from Gaussian distribution.In addition, we generate an s-sparse signal of length N = 256, and each nonzero element of this signal is chosen at random and drawn from standard Gaussian distribution.Here, we define the subsampling rate as τ = M/N.In each experiment, we let HTP execute with four different α: α = 1, α = α * − 0.5, α = α * and α = α * + 0.5, where α * denotes the step size with which HTP has the optimal empirical performance.In all the tests in this section, each experiment is repeated 500 times independently.To evaluate the performance of the tested algorithms, we use the relative reconstruction error as a criteria to measure the reconstruction accuracy for sparse signals, where the relative reconstruction error is defined as y − Ax n 2 y 2 .For all algorithms tested, we set a common stopping criterion: n > 200 or the relative reconstruction error is less than 10 −6 .
The first issue to be verified concerns the number of iterations required by the CGHTP algorithm in comparison with HTP algorithm and CGIHT algorithm.Here we choose the sampling rate τ = 0.5.For each 1 ≤ s ≤ 80, we compute the average number of iterations of different algorithms and plot the curve in Figure 4, where we discriminated between successful reconstructions (mostly occurring for 1 < s ≤ 40) and unsuccessful reconstructions (mostly occurring for 40 < s ≤ 80).As can be seen from Figure 4, in the successful case, the CGHTP algorithm requires fewer iterations than the CGIHT algorithm and HTP algorithm with α * (in the case α * = 3.5).In the unsuccessful case, the number of iterations of CGHTP algorithm is comparable to that of HTP algorithm with α * .This is because for the same measurement matrix Φ, when the sparsity level is small, the space formed by the columns of the A T n is less correlated, so the condition number of A T T n A T n is small, the advantage of using CG method is obvious; as the sparsity level increase, the condition number of A T T n A T n becomes larger, which has a bad influence on the convergence rate of CGHTP.The average time consumed by different algorithms is shown in Figure 5.It can be seen that CGHTP algorithm requires the least average reconstruction time than other algorithms, because it requires fewer iterations than other algorithms.At the same time, to verify the advantages of the proposed algorithm in convergence speed, we select a fixed sparse signal and record the relative reconstruction errors of different algorithms under different number of iterations.As can be seen from Figures 6 and 7, as the number of iterations increases, the relative reconstruction errors of the signals reconstructed by HTP and CGHTP decrease, and becomes stable at a certain error value where the convergence is reached.However, the relative reconstruction error of CGIHT algorithm decreases very slowly with the increase of iteration times.Figures 6 and 7 show that the number of iterations for convergence of CGHTP algorithm is smaller than that of HTP algorithm with an optimal step size and CGIHT algorithm, which reflects that CGHTP algorithm converges faster than HTP algorithm and CGIHT algorithm.The acceleration strategy chosen by the gradient descent method and the conjugate gradient method results in a better performance of the CGHTP algorithm in convergence rate.The recovery ability of the proposed algorithm is evaluated for different sparsity levels and different sampling rates.In each experiment, it is recorded as a successful reconstruction if x n − x 2 < 10 −4 x 2 .Figure 8 describes the exact reconstruction rate of different algorithms with different sparsity levels at a fixed sampling rate of τ = 0.5 while Figure 9 depicts the recovery performance of these algorithms at different sampling rates at a fixed sparsity level s = 10.Figures 8 and 9 show that the CGHTP is outperforms other algorithms.In addition, for the same rate of exact reconstruction, CGHTP requires fewer samples than other algorithms.

Real-World Image Reconstruction
In this subsection, we investigated the performance of CGHTP to reconstruct natural images.Test images used in experiments consist of four natural images of size 512 × 512, as shown in Figures 10a, 11a, 12a and 13a respectively.We choose the discrete wavelet transform(DWT) matrix as the sparse transform basis and the Gaussian random matrix as the measurement matrix.The sparsity level is set to one sixth of the number of samples, i.e., s = M/6.In all tests, each experiment is repeated 50 times independently and we set an algorithm-stopping criterion: n > 200 or x n − x n−1 2 < 10 −4 x n 2 .The reconstruction performance of images is usually evaluated by the Peak Signal to Noise Ratio (PSNR), which is defined as: where MSE denotes the mean square error, and it can be calculated by: where the matrices X and X of size a × b represent the original image and the reconstructed image.Meanwhile, we also consider using relative error (abbreviated as Rerr) to evaluate the quality of image reconstruction: Table 1 shows the average reconstruction PSNR, Rerr, and reconstruction time of four reconstructed images by different algorithms at different sampling rates.It can be seen from Table 1 that the reconstruction performance of different algorithms is different for four tested images.In terms of reconstruction PSNR value, the lower the sampling rate, the more obvious the advantage of CGHTP algorithm over HTP algorithm and CGIHT algorithm.Especially in the case of low sampling rate (T = 0.2), the reconstructed PSNR value of the four tested images obtained by CGHTP algorithm is about 2 dB higher than that obtained by HTP algorithm with the best step size.Similarly, the reconstruction relative error of CGHTP is slightly lower than that of other algorithms.In addition, the average reconstruction time of CGHTP is comparable to that of HTP algorithm with an optimal step size.Due to the calculation of the step size α and the orthogonal factor β, the CGHTP algorithm requires slightly more computation than the HTP algorithm by approximately O(Ns).However, in each iteration, the conjugate direction may be used as the search direction in CGHTP to accelerate the convergence rate, so they need fewer iterations than the HTP algorithm.Overall, CGHTP has an empirical performance comparable to the HTP algorithm with an optimal step size in natural image reconstruction.

Conclusions
In this paper, we introduce a new greedy iterative algorithm, termed CGHTP.Using conjugate gradient method to accelerate the convergence of the original HTP algorithm is one of the strong features of the algorithm, which makes the CGHTP algorithm faster than the HTP algorithm.Reconstruction experiments with Gaussian sparse signals and natural images illustrate the advantages of the proposed algorithm in convergence rate and reconstruction performance, especially in the case of low sparsity, CGHTP requires fewer iterations than HTP with the best step size and CGIHT algorithm.Although the proposed algorithm has better reconstruction performance, its theoretical guarantee is not very strong, our future works may focus on the research about stronger theoretical guarantee of this algorithm.In addition, we will study how to apply this method to more practical applications.

Figure 3 .
Figure 3. Block diagram of CGHTP algorithm.T n is the support set of x in the nth iteration, "LSM" denotes least square method.

Figure 4 .
Figure 4. Number of iterations for HTP with different step size, CGIHT, and CGHTP algorithms.

Figure 5 .
Figure 5.Time consumed by HTP with different step size, CGIHT, and CGHTP algorithms.
visually reflects the contrast between the original image and the reconstructed image, where subfigure (b-d) represents the reconstructed images obtained by HTP (α = α * ), CGIHT, and CGHTP, respectively.

Table 1 .
Reconstruction results of Lenna, Pens, Pepper, and Baboon images at different sampling rates by HTP, CGIHT, and CGHTP.The optimal step size α * for reconstructing Lenna, Pens, Pepper and Baboon images by HTP algorithm are α * = 5.5, α * = 5, α * = 6 and α * = 4, respectively.The abbreviations PSNR and Rerr in the table represent the Peak Signal to Noise Ratio and relative reconstruction error of the reconstructed image, respectively.