A Projection Hestenes–Stiefel Method with Spectral Parameter for Nonlinear Monotone Equations and Signal Processing

A number of practical problems in science and engineering can be converted into a system of nonlinear equations and therefore, it is imperative to develop efficient methods for solving such equations. Due to their nice convergence properties and low storage requirements, conjugate gradient methods are considered among the most efficient for solving large-scale nonlinear equations. In this paper, a modified conjugate gradient method is proposed based on a projection technique and a suitable line search strategy. The proposed method is matrix-free and its sequence of search directions satisfies sufficient descent condition. Under the assumption that the underlying function is monotone and Lipschitzian continuous, the global convergence of the proposed method is established. The method is applied to solve some benchmark monotone nonlinear equations and also extended to solve ℓ 1 -norm regularized problems to reconstruct a sparse signal in compressive sensing. Numerical comparison with some existing methods shows that the proposed method is competitive, efficient and promising.


Introduction
Let R n be an n−dimensional Euclidean space with inner product ·, · and norm · . Suppose that D is a nonempty closed convex subset of R n . We denote by R n + the set {(x 1 , x 2 , ..., x n ) T ∈ R n | x i ≥ 0, i = 1, 2, ..., n}. In this paper, we consider the problem of finding a pointx in the set D for which It is interesting to note that nonlinear equations in the form of (1) has various background and applications in science and engineering, such as the first-order necessary condition of the algorithm and its global convergence. In Section 3, we present the numerical experiments and give the conclusions in Section 4.

Proposed Algorithm for Monotone Equations and Its Convergence Analysis
Definition 1. Let x, y ∈ R n , a function F : R n → R n is said to be (ii) Lipschitzian continuous if there exists L > 0 such that We begin by recalling the well-known Hestenes-Stiefel (HS) conjugate gradient method for solving unconstrained optimization problem of the form where f : R n → R is a continuously differentiable function and bounded from below. HS CG method updates its sequence of iterates using the following recursive formula: where α k > 0 is a suitable stepsize. The search direction d k is defined by where the CG parameter is given by β HS k = F k , y k−1 y k−1 , d k−1 , and y k−1 = F k − F k−1 and F k denotes the gradient of f at x k .
Definition 2. Let f : R n → R be differentiable at x ∈ R n and d ∈ R n satisfies F, d < 0, where F denotes the gradient of f , then d is called a descent direction of f at x.
It is well known that (7) is a crucial property for iterative methods (5) to be globally convergent. However, the HS CG method does not guarantee (7), and therefore not globally convergent for general nonlinear functions.
The HS-CG method has enjoyed some forms of modifications in order to improve its convergence properties and/or its numerical performance. Recently, Amini et al. [33] studied some modified HS CG for solving (4). Based on the work of Narushima et al. [34] and Dai and Kou [35], Amini et al. proposed a modified HS CG method with the search direction defined as follows where and θ k = 1 − F k , d k−1 2 / F k 2 d k−1 2 . They showed that if y k−1 , d k−1 = 0 and η > 1/4, then the search direction d k satisfies the sufficient descent condition (7). The natural question that comes to mind is, can we modified (9) in such a way that these two restrictions are removed and still retain the nice properties associated with (8)?
In this paper we provide answer to the above question. Let w k+1 = x k + α k d k , we define the proposed search direction as follows d 0 = −F(x 0 ) and where The incorporation of the spectral parameter v k in the definition of the search direction is to improve the numerical performance of the proposed algorithm. Next, we describe projection operator which is usually used in iterative algorithms for solving problems such as fixed point problem, variational inequality problem, and so on. Let x ∈ R n and define an operator P D : R n → D by P D (x) = argmin{ x − y : y ∈ D}. The operator P D is called a projection onto the feasible set D and it enjoys the nonexpansive property, that is, P D (x) − P D (y) ≤ x − y , ∀ x, y ∈ R n . If y ∈ D, then P D (y) = y and therefore, we have We now state the steps of the proposed algorithm in Algorithm 1.
Step 1: Compute F(x k ). If F(x k ) ≤ Tol, stop, otherwise go to step 2.
Step 2: Compute the search direction as follows. If k = 0, d k = −F(x k ). Else Step 3: Determine the stepsize α k = κ i where i is the smallest nonegative integer such that Step 4: Set w k+1 = x k + α k d k . If w k+1 ∈ D and F(w k+1 ) ≤ Tol then stop. Else compute the next iterate using Step 5: Set k := k + 1 and repeat the process from Step 1. (14) is more general than that of [36,37]. When r = 1, then the line search (14) reduces to the line search in [36] and when r is sufficiently large enough, it reduces to the line search in [37].

Remark 1. The line search defined by
Assumption 1. Throughout this paper, we assume the following (i) The solution set of problem (1) is nonempty.
The following lemma shows that the proposed search direction is well-defined and satisfies (7) independent of line search strategy. Lemma 1. The parameters β PMHS k and v k defined by (11) and (12), respectively, are well-defined. In addition, ∀ k ≥ 0, the search direction d k defined by (10) satisfies Proof of Lemma 1. From the monotonicity of F, By the definition of s k−1 and w k in Steps 2 and 4 of Algorithm 1, we have By (3) we have So (17) and (19) give Hence v k is well-defined and so is β PMHS k by (18). Now taking the inner product of the search direction defined by (10) Again, suppose k > 0 and β PMHS k > 0, we have The two inequalities follow from Cauchy-Schwarz inequality and (20) respectively. Hence, the desired result holds.

Lemma 2.
Let the sequence of iterates {x k } and the search direction {d k } be generated by Algorithm 1, then there always exists a step-size α k satisfying the line search defined by (14) for any k ≥ 0.
Proof of Lemma 2. Suppose on the contrary that there exists some k 0 such that for any i = 0, 1, 2, ..., the line search (14) does not hold, that is By the fact that F is continuous, ∈ (0, 1) and { d k } is bounded for all k (see Lemma 3 below), It is clear that the inequality (22) contradicts (16). Hence the line search (14) is well-defined.
Lemma 3. Let Assumption 1 hold andx be a solution of problem (1). Suppose the sequences {w k }, {x k } and {d k } are generated by Algorithm 1. Then the following hold From the property (13) we have From the definition of x k+1 , we have The first and second inequalities respectively follow from (24) and (23). The last inequality holds by dropping the second negative term on the right hand side. This implies that x k −x ≤ x 0 −x for all k, and therefore the sequence {x k } is bounded and lim To show (ii), let k = 0, by the definition of the search direction (10) we have Suppose k > 0 and β PMHS k ≤ 0, then max{β PMHS k , 0} = 0 and (10) reduces Therefore, combining with (20) and (26) yields By the Lipschitzian continuity, we have Again, suppose k > 0 and β PMHS k > 0, we have The first and second inequalities follow from triangle inequality and Cauchy-Schwartz inequality, respectively. The third inequality follows from (18) and (29), while the fourth inequality follows from (26). If we let c : Next, we show (iii). By the boundedness {x k } and (30), it follows from the definition w k that {w k } is also bounded. By Lipschitzian continuity of F, there exists some constant c 2 for which To show (iv), from (25), we deduce Since the stepsize α k in Step 3 of Algorithm 1 satisfies α k ≤ 1, ∀ k, then from (14), we have Combining (32) and (33) gives Multiplying both sides of (34) by F(w k+1 ) −2/r and using (31) gives Taking limits of both sides give Thus, lim Finally, we show (v). Equation (37), together with the definition of w k+1 in Step 4 of Algorithm 1 yields By the property of projection (13), we have Theorem 1. Let {x k } be the sequence generated by Algorithm 1. Suppose Assumption 1 holds then (ii) the sequence {x k } converges to a pointx which satisfies F(x) = 0.
Proof of Theorem 1. We prove (i) by contradiction. Now suppose that then we can find some positive constant, say ϑ such that Applying the Cauchy-Schwarz inequality to (16) yields t F(x k ) ≤ d k . This, together with (40) gives d k ≥ tϑ. Combining with (37), we obtain Since F satisfies Lipschitzian continuity, then it holds where m is a positive constant. The last inequality follows from the boundedness of {x k } and {d k } together with the definition of α k . If α k = κ, since Algorithm 1 uses a backtracking process to compute α k starting from κ, then −1 α k does not satisfy (14), that is, where M := σ −1 c 2 m 1/r and the second inequality follows from (30) and (42). Since {x k } and { d k } are bounded, there exist some accumulation pointsx andd of {x k } and { d k } respectively and some infinite index sets K and K * for which lim k∈K x k =x and lim k∈K * d k =d where K * ⊂ K. Therefore by the continuity of F and taking limit on both sides of inequality (43) for k ∈ K * we obtain On the other hand, (16) and (40) gives − F k , d k ≥ tϑ 2 . Taking limit for k ∈ K * we obtain The inequalities (44) and (45) yield contradiction and therefore (i) must hold. Finally, we show (ii) holds. Now, since F is continuous and the sequence {x k } is bounded, then there is some accumulation point of {x k } sayx for which F(x) = 0. By boundedness of {x k }, we can find subsequence {x k j } of {x k } for which lim j→∞ x k j −x = 0. Since lim k→∞ x k −x exists, from (i) of Lemma 3, we can conclude that lim k→∞ x k −x = 0 and the proof is complete.

Experiment on Monotone Equations and Application in Signal Processing
In this section, we demonstrate the numerical performance of Algorithm 1 (HSS) and its computational advantage by comparing with some existing methods. We divide the experiment into two subsections where the first subsection is devoted to solving some test problems while the second subsection discussed the application of HSS algorithm in signal recovery. For the monotone nonlinear equations experiment, all solvers were coded in MATLAB R2019b and run on a PC with intel Core(TM) i5-8250u processor with 4GB of RAM and CPU 1.60GHZ.
We implemented HSS method using the following parameters κ = 1, σ = 0.01, = 0.5, r = 5 and a = 0.01 while the parameters used for CGD, PDY and MFRM methods come from [15,25,32], respectively. We set the terminating criterion for the iteration process as F(x k ) ≤ 10 −6 or F(w k ) ≤ 10 −6 and declare failure (denoted by "-") whenever the number of iterations exceeds 1000 and the terminating criterion has not been satisfied.
We carried out the comparison based on ITER (number of iterations), FVAL (number of function evaluations) and TIME (CPU time(s)) and the numerical results obtained by each solver are reported in Tables 1-11. We see from the NORM (norm of the objective function) reported in Tables 1-11 that the proposed HSS algorithm obtained the solutions of all the test problems in each instance while the other three methods (CGD, PDY and MFRM) failed to obtain the solutions of some problems in some instances. This means our proposed algorithm can serve as an alternative to some existing methods. The numerical results reported show that the proposed HSS algorithm recorded least ITER, FVAL and TIME in most instances. We summarized all the information from Tables 1-11 in Figures 1-3 based on the performance profile by Dolan and Moré [38] which tells the percentage win by each solver. In terms of ITER and FVAL, we see from Figures 1 and 2 that the HSS algorithm performs well and won about 70% of the whole experiments compared to the existing methods. In addition, Figure 3 shows that the HSS algorithm is faster than all the three methods compared with. Therefore, we can regard the proposed HSS algorithm as more efficient than CGD, PDY and MFRM methods with respect to the numerical experiments performed.         ------------x2  27

Second Experiment on Signal Processing
The major advantage of the proposed HSS algorithm is that it does not require the knowledge of the derivative of an objective function and therefore suitable for solving nonsmooth functions. As mentioned in the introduction section, many applications can be converted into monotone nonlinear equation. In particular, we consider the reconstruction of an original signal, sayx ∈ R n by minimizing an objective function that contains a linear least square error term and a sparseness-including 1 regularization term where x ∈ R n y ∈ R k is an observation, E ∈ R k×n (k << n) is a linear operator and µ ≥ 0 a parameter. Problem (46) has received much attention and different iterative methods for solving it have been proposed by many researchers (see, [39][40][41][42][43][44][45]). One of the popular methods for solving (46) is the gradient projection method for sparse reconstruction (also known as GPSR) proposed by Figueiredo et al. [46]. Other gradient-based iterative algorithms include the coordinate gradient descent [47] and the accelerating gradient projection methods [48] among others. These methods enjoyed good performance, even though they required the knowledge of the gradient. The derivative-free nature of our proposed algorithm makes suitable to deal with the nonsmooth problem (46). However, Algorithm 1 is designed to handle problems in the form of (1) and therefore we need to rewrite problem (46) into the form of problem (1). Fortunately, the work of Figueiredo et al. [46] shows that if we let q = [u v] T , then (46) can be translated into the following bound-constrained quadratic programming problem where It is not difficult to see that the matrix G is a positive semi-definite. On the other hand, Xiao et al. [2] solve problem (47) in a different way by writing it as the following linear variable inequality problem Find q ∈ R n such that By taking the special structure of the feasible region of q into consideration, they further showed that problem (48) is equivalent to the following linear complementary problem Find q ∈ R n such that We can see that q ∈ R n is a solution of problem (49) if and only if it satisfies the following F(q) := min{q, Gq + c} = 0.
In (50), the function F is a vector-valued and the "min" operator denotes the componentwise minimum of two vectors. Problem (50) is in the form of problem (1) and interestingly, Lemma 3 of [49] and Lemma 2.2 of [2] show that the function F satisfies Assumption 1 (ii) that is, Lipschitzian continuity and monotonicity. Hence, the proposed derivative-free HSS algorithm can be applied to solve it. At each iteration, our proposed algorithm is applied to the resulting problem (50) without requiring the Jacobian matrix information.
We compared HSS algorithm and the modified self-adaptive CG method (MSCG) [50] based on their performance in restoring a length-n sparse signal from k observations. We used three metrics; namely ITER (number of iterations), CPU (CPU time) and mean of squared error (MSE) to evaluate the performance of each method. The MSE is usually used to assess the quality of reconstruction and is calculated using the following formula where x andx denote the original and restored signal respectively. The measurement y contains noise, where ω is the Gaussian noise disturbed as N(0, 10 −4 ) and G is the Gaussian matrix generated by command randn(m, n), in MATLAB. The size of the signal is selected with n = 2 15 and m = 2 13 . The original signal contains 2 7 random nonzero elements. For the signal recovery experiment, the algorithms were coded in MATLAB R2019b and run on a PC with intel(R) Core(TM) i7-10510U processor with 8.00 GB (7.80 GB usable) of RAM and CPU 1.80GHZ-2.30GHz. We started the iteration process by the measurement signal, i.e., x 0 = G T y, and used f (x) = 1 2 y − Ax 2 2 + µ x 1 as the merit function. We used the same parameters as in the first experiment for HSS method except for a = 0.2 while the parameters for MSCG come from [50]. We terminated the iteration process when the relative change of the objective function satisfies f (x k−1 ) < 10 −5 . In order to have relatively fair assessment for both methods, we run each code with the same initial point, using the same continuation technique on parameter µ and observed the convergence behaviour of each method to obtain a solution with similar accuracy. We presented numerical results of the experiment for fifteen different noise samples in Table 12 and the average of each column is also computed. In addition, we presented the graphical view of the original, disturbed and recovered signals in Figures 4 and 5. From Table 12, we see that HSS algorithm restored the disturbed signal with least ITER values and also converged faster than MSCG based on the CPU time(s). Moreover, the quality of reconstruction by HSS algorithm is slightly better than that of MSCG method since the MSE with respect to the former is a little less than that of the later.

Conclusions
A Hestenes-Stiefel-like derivative-free method with spectral parameter for nonlinear monotone equations has been proposed based on a suitable line search strategy and projection technique. The method is a modification of the conjugate gradient algorithm proposed by Amini et al. [33]. Two types of experiments were presented to show the efficiency of the proposed method. Numerical results reported show that the proposed outperformed three existing methods [15,25,32] and was able to recover some disturbing signals in compressive sensing with better quality than the method in [50]. The convergence analysis of the proposed method was established under standard assumptions.