Least-Square-Based Three-Term Conjugate Gradient Projection Method for (cid:96) 1 -Norm Problems with Application to Compressed Sensing

: In this paper, we propose, analyze, and test an alternative method for solving the (cid:96) 1 -norm regularization problem for recovering sparse signals and blurred images in compressive sensing. The method is motivated by the recent proposed nonlinear conjugate gradient method of Tang, Li and Cui [Journal of Inequalities and Applications, 2020(1), 27] designed based on the least-squares technique. The proposed method aims to minimize a non-smooth minimization problem consisting of a least-squares data ﬁtting term and an (cid:96) 1 -norm regularization term. The search directions generated by the proposed method are descent directions. In addition, under the monotonicity and Lipschitz continuity assumption, we establish the global convergence of the method. Preliminary numerical results are reported to show the efﬁciency of the proposed method in practical computation.


Introduction
Discrete ill-posed problems are systems of linear equations arising from the discretization of ill-posed problems. Consider the linear system where t ∈ R n is an original signal, A ∈ R m×n (m < n) is a linear map, and b ∈ R m is an observed data.
In particular, the original signal t is assumed to be sparse, that is, it has very few non-zero coefficients.
Since m < n, the linear system (1) is usually referred to as ill-conditioned or under-determined problems. In compressive sensing (CS), it is possible to regain the sparse signal t from the linear system (1), by finding the solution of the 0 -regularized problem where t 0 denotes the nonzero components in t. Unfortunately, the 0 -norm is NP-hard in general. Based on this, researchers have developed alternative model by replacing the 0 -norm with 1 -norm. Thus, they solved the following problem: Under some mild assumptions, Donoho [1] proved that solution(s) of problem (2) also solves (3). In most applications, the observed value b usually contains some noise, thus problem (3) can be relaxed to the penalized least squares problem where τ > 0, balancing the trade-off between sparsity and residual error. Problems of the form (4) have become familiar over the past three decades, particularly in compressive sensing contexts. Interested readers may refer to the recent papers (Refs. [2,3]) for more details.
In order to address problem (4), several numerical algorithms have been proposed, for instance, Daubechies, Defrise, and Demol [4] proposed the iterative shrinkage thresholding (IST) algorithm; thereafter, Beck and Teboulle [5] proposed the fast iterative shrinkage thresholding algorithm (FISTA). These algorithms are well known due to their simplicity and efficiency. Likewise, Hale, Yin, and Zhang [6] proposed the fixed-point continuous search method [6], and an acceleration technique was introduced by nonmonotone line search with the Barzilai-Borwein stepsize [7]. He, Chang, and Osher [8] introduced the unconstrained formulation of 1 -regularization problem, where the Bregman iterative approach was used to obtain the solutions to problem (4). The proximal forward backward splitting technique is another efficient technique for solving (4). This technique is based on the proximal operator introduced by Moreau [9]. Another type of method for solving the problem (4) is by using the gradient descent method. Quite recently, Figueiredo, Nowak, and Wright [10] first developed a gradient projection method to solve the sparse reconstruction problem. Thereafter, Xiao and Zhu [11,12] proposed a conjugate gradient projection method and a spectral gradient method to solve problem (4), respectively. Unlike IST and FISTA, in order to solve problem (4), the problem was first transformed into a monotone operator equation; see Section 2. Thereafter, an algorithm is developed to solve these systems of equations.
With the approximate equivalence between problem (4) and a system of monotone operator equations, one of the methods for solving the systems of nonlinear monotone operator equations is the conjugate gradient method. Considering the importance of the method, several extensions of this method have been proposed. The three-term conjugate gradient method happens to be one such extension. The first two three-term conjugate gradient method was introduced by Beale [13] and Nazareth [14] to weaken the condition of global convergence of the two-term conjugate gradient method. It is clear that, due to the existence of an additional parameter in the three-term conjugate gradient schemes, establishing the sufficient descent property is more accessible than the two-term conjugate gradient ones. To this end, the three-term conjugate gradient methods are presented, analyzed, and extensively studied in several references because of their advantages in the descent property and the computational performance. The references [15][16][17] have proposed different three-term conjugate gradient methods, and shown their specific properties, global convergence, and numerical performance. Tang, Li and Cui [18] presented a new three-term conjugate gradient approach based on the technique of the least-squares. Their approach incorporates the advantage of two existing efficient conjugate gradient approaches and also generates sufficient descent direction without the aid of a line search procedure. Preliminary numerical tests indicate that their method is efficient. Due to the simplicity and low storage requirements of the conjugate gradient method, numerous researchers have recently extended a number of conjugate gradient algorithms designed to solve unconstrained optimization problem to solve large-scale nonlinear equations. Using the popular CG_DESCENT method [19], Xiao and Zhu [12] recently constructed a conjugate gradient method (CGD) based on the projection scheme of Solodov and Svaiter [20] to solve monotone nonlinear operator equations with convex constraints. The method was also successfully used to solve the sparse signal in compressive sensing. Interested readers may refer to the following articles [21][22][23][24][25][26][27] for an overview of algorithms used for solving monotone operator equations.
Inspired by the work of Xiao and Zhu [12], the least-squares-based three-term conjugate gradient method (LSTT) for solving unconstrained optimization problems by Tang, Li, and Cui [18] and the projection technique of Solodov and Svaiter [20], we further study, analyze, and construct a derivative-free least-square-based three-term conjugate gradient method to solve the 1 -norm problem arising from the reconstruction of sparse signal and image in compressive sensing. The method can be viewed as an extension of the LSTT method for solving unconstrained optimization problem and a projection technique. Under the monotonicity and Lipchitz continuity assumption, the global convergence of the proposed method is established using the backtracking line search. Computational experiments are carried out to reconstruct sparse signal and image in compressive sensing. The numerical results indicate that the proposed method is more efficient and robust.
The rest of the paper is organized as follows. In Section 2, we give a review of the reformulation of problem (4) into a convex quadratic program problem by Figueiredo et al. [10]. In Section 3, we present the motivation and general algorithm of the proposed method. The global convergence of the proposed algorithm is presented in Section 4. In Section 5, numerical experiments are presented to illustrate the efficiency of our algorithm. Unless otherwise stated, throughout this paper, the symbol · denotes the Euclidean norm. Furthermore, the projection map denoted as P S , which is a mapping from R n onto the non-empty, closed and convex subset S ⊆ R n , that is, P S (t) := arg min{ t − y |y ∈ S}, which has the well known nonexpansive property, that is,

Reformulation of the Model
Figuredo, Nowak, and Wright [10] gave the reformulation of the minimization problem (4) into a quadratic programming problem as follows. Consider any vector t ∈ R n , t can be rewritten as follows: where u ∈ R n , v ∈ R n and u i = (t i ) + , v i = (−t i ) + for all i ∈ [1, n] with (·) + = max{0, ·}. Therefore, the 1 -norm could be represented as t 1 = e T n u + e T n v, where e n is an n-dimensional vector with all elements one. Thus, (4) was rewritten as Moreover, from [10], with no difficulty, (6) can be rewritten as the quadratic program problem with box constraints. That is, where Simple calculation shows that H is a semi-definite positive matrix. Hence, (7) is a convex quadratic program problem, and it is equivalent to The function F is a vector-valued and the "min" interpreted as componentwise minimum. From ( [28], Lemma 3) and ( [11], Lemma 2.2), we know that the mapping F is Lipschitz continuous and monotone. Hence, the algorithm for solving problem (8) can be equally and effectively used to solve problem (4).

Algorithm
Recently, Chunning, Shuangyu, and Zengru [18] proposed a new three-term conjugate gradient method based on the least-squares technique for solving the following unconstrained optimization problem: where the function f is continuously differentiable from R n into R and the gradient ∇ f (t) is available. Similar to most conjugate gradient methods, the iterative scheme of the conjugate gradient method developed in [18] generates a sequence of iterates by letting where α k is the steplength and the search direction d k is updated by , β k and θ k are scalar computed as A valid approach for solving (8) is by using a derivative-free line search to ensure the step-size α k [29]. To this end, we present the following derivative-free least-square three term conjugate gradient projection algorithm (DF-LSTT).
Step 1. Determine the step-size α k = β i , where i is the smallest non-negative integer such that the following line search is satisfied: Step 2. Compute u k := t k + α k d k .
Step 3. If u k ∈ S and F(u k ) = 0, stop. Otherwise, compute the next iterate by where Step 4. If the stopping criterion is satisfied, that is, if F(t k ) ≤ Tol, stop. Otherwise, compute the next search direction d k by where Step 5. Finally, we set k := k + 1 and return to step 1.

Remark 1.
In order to ensure that the parameters β k and θ k are well defined when extending them to solve (4), we re-modify their denominator using the scalar y T k−1 d k−1 . Furthermore, to guarantee the boundness of the search direction, we assume the operator under consideration to be monotone rather than uniformly monotone as assumed in [18]. Lemma 1. The search direction d k generated by DF-LSTT algorithm is a descent direction. That is, Proof. Now, by direct computation, we have This completes the proof.

Global Convergence
In this section, we investigate the global convergence property of DF-LSTT algorithm for solving (8). For this purpose, we make the following assumptions.

Assumption 1.
A1. The mapping F is Lipschitz continuous, that is, there exists a constant L > 0 such that A2. Xiao et al. [11] proved that, for the problem (8), F is monotone. That is, Lemma 2. Suppose that Assumption 1 holds. Let {u k } and {t k } be sequences generated by (11) and (12) in the DF-LSTT algorithm. Then, the following statements hold: Proof. Since F is monotone, then, for any solution t * of problem (8), we have Note that Equation (20) is obtained from the line search. From (5), we get where (22) and (23) follow from (21). From (24), it is easy to see that the sequence {t k } is bounded. That is, Furthermore, by (18), we have Letting By the monotonicity of property given in (19), we know that Therefore, by Cauchy-Schwarz inequality, we have where the last inequality can be implied from (21). Thus, it is easy to obtain that Using the continuity of F, we know that there exists a constant It follows from (23) that Adding (29) for k ≥ 0, we obtain Hence, the second assertion holds. From (5), we have then by (13) and Cauchy-Schwartz inequality, we obtain which shows that the third assertion holds.
Theorem 1. Suppose that Assumption 1 holds. Let the sequence u k and t k be generated by (11) and (12) in the DF-LSTT algorithm. Then, Proof. Suppose that conclusion (32) does not hold, that is, there exists κ 0 ≥ 0 such that Equation (33) together with (17) imply that We note that the sequences {t k }, {u k }, {F(t k )}, and {F(u k )} are bounded from (25), (31), (27), and (28), respectively. In addition, from the Lipchitz continuity and (25), we have On the other hand, from the definition of y k−1 , it holds that Thus, by Cauchy-Schwartz inequality, we obtain Note that the last inequality in (35) can be easily obtained using (34). Therefore, from (14), it follows that From (22), α k −1 does not satisfy (10). Thus, we have It follows from Lemma 1 that Equation (36) is obtained by using Cauchy-Schwartz inequality and Lipchitz continuity (18). Therefore, it holds that Note, from (33), the last inequality is obtained which implies that holds, which contradicts (31). Thus, (32) holds.

Numerical Experiment
We present numerical experiments to show the efficiency of the DF-LSTT method. The experiments presented here are of two types. The first experiment involves applying the DF-LSTT method to solve the 1 -norm regularization problem arising in compressive sensing. The second involves testing DF-LSTT in solving some given convex constrained nonlinear equations with different initial points and various dimensions. The implementations were performed using Matlab R2019b Update 1 (9.7.0.1216025, Mathwork Inc, Massachusetts, USA) on a HP PC (Hewlett-Packard, California, USA) with CPU 2.4 GHz, 8.0 GB RAM with the Windows 10 operating system.

Experiments on the 1 -Norm Regularization Problem in Compressive Sensing
We begin by considering a typical compressive sensing scenario where the ultimate goal is to reconstruct a length-n sparse signal from m observations (m << n) with Gaussian noise, where the number of samples is dramatically smaller than the size of the original signal. We compare DF-LSTT with the CGD conjugate gradient method [12] and the PCG projection method [30] designed to solve nonlinear equations with convex constrained and signal recovery problems.
As a consequence of the limited memory of our PC, in this test, we considered a small size of signal with n = 2 11 , m = 2 9 and the original t contains 2 6 randomly non-zero elements. Similar to [11,12,31], the quality of the restored signal is assessed by the mean of squared error (MSE) to the original signal t, that is, where t * is the restored signal. In this test, we generate the random matrix A using the Matlab command rand(n,k). In addition, noise is appropriately added to the observed data computed by where η is the Gaussian noise with N(0, 10 −4 ). The DF-LSTT algorithm is implemented with the following parameters: β = 10, = 0.55, ς = 10 −4 .
For the compared methods, their parameters are set as reported in their respective papers. In line with [32], we chose the parameter τ in the merit function f (t) = τ t + 1 2 b − At 2 as τ = 0.008 A T b ∞ and the initial point for all algorithms also starts at t 0 = A T b. The process terminates when where f k denotes the function value at t k . Note, for this test, we only observe the convergence behavior of each method to obtain a similar accuracy solution.
In view of the plots depicted in Figure 1, DF-LSTT wins in decoding sparse signals in compressive sensing in terms of number of iterations for around 116 and around 0.69s. Table 1 contains the number of iterations, mean of Squared error (MSE), and time of the decoding of sparse signal for over 10 runs for the algorithms tested. The reported results in Figure 2 shows that, in decoding sparse signal in compressive sensing, DF-LSTT is faster than CGD and PCG with a clear lowest number of iterations.  Next, the effectiveness and robustness of DF-LSTT algorithm is illustrated in an image de-blurring problem. We carried out our experiment using some widely used test images obtained from http://hlevkin.com/06testimages.htm. DF-LSTT is compared with the state-of-art methods CGD proposed by Xiaoh and Zhu [12], SGCS [11], and MFR [33].
The performance evaluation criteria to measure the quality of restoration by the methods are measured by the signal-to-noise ratio (SNR) defined as SNR := 20 × log 10 t t * − t , the peak signal-to-noise ratio (PSNR) [34], and the structural similarity index (SSIM) [35]. For fairness in comparing the algorithms, the iteration process of all algorithms started at t 0 = A T b and terminates when Tol < 10 −5 . For the image de-blurring experiment, the following parameters were used in our implementation: β = 0.001, = 0.6, ς = 10 −4 . We tested several images including Tiffany (512 × 512), Lena (512 × 512), Barbara (720 × 576), degraded by Gaussian blur and 10% Gaussian noise.
In Table 2, we report the performance for SNR, PSNR, and SSIM for DF-LSTT, CGD, SGCS, and the MFRM method in recovery blurred and noisy images. We can see that the SNR, PSNR, and SSIM of the test images calculated by the DF-LSTT algorithm are a bit higher than CGD, SGCS, and MFRM. The higher value of SNR, PSNR, and SSIM reflects better quality of restoration. Based on the performance reported in Table 2, the DF-LSTT algorithm restores blurred and noisy images quite well and obtain better quality reconstructed images in an efficient manner. Figure 3 and 4 shows the original and blurred images while Figure 5 shows the restored images by each method.

Problem 1.
This problem is the Exponential function [36] with constraint set S = R n + , that is, Problem 2. Modified Logarithmic function [36] with constraint set S = {t ∈ R n : ∑ n i=1 t i ≤ n, t i > −1, i = 1, 2, . . . , n}, that is, Problem 3. The function f i (x) [37] with S = R n + defined by Problem 4. The Strictly convex function [38], with constraint set S = R n + , that is,

Problem 5.
Strictly convex function II [38], with constraint set S = R n + , that is, f i (x) = 1 n e t i − 1, i = 2, 3, · · · , n. Problem 6. Tridiagonal Exponential function [39] with constraint set S = R n + , that is, f n (t) = t n − e cos(h(t n−1 +t n )) , where h = 1 n + 1 Problem 7. Nonsmooth function [40] with constraint set S = {t ∈ R n : ∑ n i=1 t i ≤ n, t i ≥ −1, 1 ≤ i ≤ n} : In Tables 3-10, we report the computation results obtained from the implementation of DF-LSTT, CGD, and PCG algorithm. We report the number of iterations (ITER), the number of function evaluations (FVAL), and the CPU time in seconds (TIME).
We employ the widely used performance profile metric of Dolan and More [41] to compare the performance of the methods. The profile metric measures the ratio of each method by its computational outcome versus the computational outcome of the best presented method. The performance profile metric operates in the following manner. Let S m and P m denote the set of methods and test problems, respectively. In this section, we see a problem with different dimensions or different initial points as another new problem. For n s methods and n p problems, the performance profile ρ : R → [0, 1] is defined as follows: for each problem p ∈ P m and for each method s ∈ S m , they define t p,s := (computing time required to solve problem p by method s) The performance ratio is given by r p,s := t p,s / min s∈S m t p,s . Then, the performance profile is defined by ρ(τ) := n −1 p size{p ∈ P m | log 2 (r p,s ) ≤ τ}, ∀τ ∈ R + , where 1 ≤ s ≤ n s .
We note that ρ(τ) is the probability for a method s ∈ S m that log 2 (r p,s ) is within a factor τ ∈ R + of the best possible ratio. Obviously, when τ takes certain value, a method with high value of ρ(τ) is preferable or represents the best method. As usual, we obtain the number of iterations, number of function evaluations, and the computing time (CPU time) from the performance profile metric. Figures 6 and 7 are the performance profile obtained based on the Dolan and Moré profile. Both figures indicate that the performances of the proposed method is competitive with the other methods. However, from both Figures 6 and 7, DF-LSTT is more efficient because it was able to solve a higher percentage of the test problems with less number of iterations and function evaluations compared to CGD and PCG methods. However, with respect to CPU time, our algorithm did not perform well. This could be as a result of several computations involved in the method.

Conclusions
In this paper, we have presented a derivative-free conjugate gradient method for solving the 1 -norm regularization problem by combining the projection technique with the proposed direction in [18]. Unlike the uniform convexity assumption used in establishing the convergence of the method in [18], our convergence was established under the monotonicity and Lipchitz continuity assumption. Our numerical experiments in recovering sparse signals and blurred images indicate efficient and robust behavior of the proposed algorithm. For instance, in recovering sparse signals, the proposed algorithm is faster than the compared algorithms. In addition, it exhibits less iterations and least mean squared error. Moreover, our algorithm is able to restore blurred and noisy images with better quality. This is reflected in the values of the SNR, PSNR, and SSIM. Furthermore, numerical experiments on a set of monotone problems with different initial points and dimensions were reported. Although, applying the proposed method in solving monotone operator equations, the proposed method does not perform well in terms of CPU time. This could be as a result of several computations involved in the method.