A Modified Hestenes-Stiefel-Type Derivative-Free Method for Large-Scale Nonlinear Monotone Equations

The goal of this paper is to extend the modified Hestenes-Stiefel method to solve large-scale nonlinear monotone equations. The method is presented by combining the hyperplane projection method (Solodov, M.V.; Svaiter, B.F. A globally convergent inexact Newton method for systems of monotone equations, in: M. Fukushima, L. Qi (Eds.)Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, Kluwer Academic Publishers. 1998, 355-369) and the modified Hestenes-Stiefel method in Dai and Wen (Dai, Z.; Wen, F. Global convergence of a modified Hestenes-Stiefel nonlinear conjugate gradient method with Armijo line search. Numer Algor. 2012, 59, 79-93). In addition, we propose a new line search for the derivative-free method. Global convergence of the proposed method is established if the system of nonlinear equations are Lipschitz continuous and monotone. Preliminary numerical results are given to test the effectiveness of the proposed method.


Introduction
In this paper, we consider the problem of finding numerical solutions of the following large-scale nonlinear equations where the function F : R n −→ R n is monotone and continuous. If F(x) is monotone, it implies that the following inequality holds F(x) − F(y), x − y ≥ 0, ∀x, y ∈ R n .
Nonlinear monotone equations can be applied in different fields, for example, they are used as subproblems in the generalized proximal algorithms with Bregman distances [1]. Some monotone variational inequality problems can be converted into nonlinear monotone equations [2]. Monotone systems of equations can also be applied in L 1 -norm regularization sparse optimization problems (see [3,4]) and discrete mathematics such as graph theory (see [5,6]).
Being aware of the important applications of nonlinear monotone equations, in recent years, many scholars have paid attention to propose efficient algorithms for solving problem (1). These algorithms are mainly divided into the following categories.
Each of the Newton-type method, Levenberg-Marquardt method, and quasi-Newton method enjoy fast local convergence property, and are attractive (see [7][8][9][10][11][12][13]). However, for large-scale problems, a drawback of these methods is that at each iteration these algorithms require computing a large-scale linear system of equations by using approximate systems of equations or a Jacobian matrix. The large demand of storage for matrix results in improper handling of large-scale nonlinear monotone systems.
In recent years, gradient-type algorithms have attracted the attention of many scholars. The main reasons are low storage requirements, easy implementation, and global convergence under mild conditions. For example, the spectral gradient method [14] only needs to use gradient information which is easy but effective for an optimization problem. From a different perspective, the spectral gradient method [14] is extended to solve nonlinear monotone equations by combining it with the projection method (see [15,16]).
In this paper, we will focus on extending the Hestenes-Stiefel (HS) CG method to solve large-scale nonlinear monotone equations. To the best of our knowledge, the HS CG method [46] is generally considered as the most efficient CG method for computing performance. However, the HS CG method does not enjoy sufficient descent property. Based on the modified secant equation [10], Dai and Wen [47] propose a modified HS conjugate gradient method that can generate sufficient descent directions (i.e., c > 0 exists such that g T k d k < −c g k 2 ). Global convergence results under the Armijo line search are obtained in Dai and Wen [47]. Hence, we aim to present a derivative-free method to solve the nonlinear monotone Equations (1). The proposed method can be seem as a further study of the modified HS CG method in Dai and Wen [47] for unconstrained optimization problems. Our paper makes two contributions to large-scale nonlinear monotone equations. Firstly, a new line search is proposed for the derivative-free method. A significant advantage of this line search is that it is easier to obtain the search stepsize. Secondly, we propose a derivative-free method for solving large-scale nonlinear monotone equations which combines the modified Hestenes-Stiefel method in Dai and Wen [47] for unconstrained optimization problems and the hyperplane projection method [13]. A good property of the proposed method is that it is suitable to solve large-scale nonlinear monotone equations due to its lower storage requirement.
The rest of the article is organized as follows. In Section 2, we give the algorithm and prove the sufficient descent property. The global convergence is proved in Section 3. We report the numerical results In Section 4. The last Section gives the conclusion.

Algorithm and the Sufficient Descent Property
In this section, we will present the derivative-free method for solving problem (1), that is a combination of the modified Hestenes-Stiefel method [47] and the hyperplane projection method [13]. Different from the traditional conjugate gradient method, the iteration sequence {x k+1 } is obtained in two steps at each iteration.
In the first step, the algorithm produces an iterative sequence {z k = x k + α k d k }, where d k is the search direction, and α k > 0 is the steplength obtained by a suitable line search. For most iterative algorithms of optimization problems, the line search plays an important role in convergence analysis and numerical calculation. Zhang and Zhou [15] obtained the steplength α k > 0 by the following Armijio-type line search: calculating the search steplength α k = max{βρ i : i = 0, 1, . . . , } such that where β is some initial attempt for α k , β > 0 and ρ ∈ (0, 1).
In addition, Li and Li [39] introduced an alternative line search, that is, computing the search steplength α k = max{βρ i : i = 0, 1, . . . , } such that From the above introduction, we can see that the steplength α k is obtained by calculating α k = max{βρ i : i = 0, 1, . . . , } such that (3) or (4) is satisfied. If the point x k is far from the solution, the obtained steplength α k may be very small. Taking this into account, we present the following line search rule where the steplength α k is obtained by computing α k = max{βρ i : i = 0, 1, . . . , } such that In the second step, {x k+1 } can be determined by using x k , z k , F(z k ) via the hyperplane projection method [13]. Now, let's introduce how to generate {x k+1 } via the hyperplane projection method [13]. Along the search direction d k > 0, we can generate a point z k = x k + α k d k by a suitable line search such that On the other hand, the monotonicity of F implies that for any solution x * (F(x * ) = 0), the following inequality holds From (6) and (7), we can see that (6) and (7), there is a hyperplane which can strictly separate the current point x k from the x * (zero point) of equation in (1). Following Solodov and Svaiter [13] and Zhang and Zhou [15], taking the projection of x k onto the hyperplane (8) as the next iterate x k+1 is a reasonable choice. In detail, the next iterate point x k+1 can be computed by In what following, we pay our attention to the search direction which plays a crucial role in an iterative algorithm. Our main starting point is to extend the search direction of Dai and Wen [47] to nonlinear monotone equations problem (1). Similar to Dai and Wen [47] for unconstrained optimization, we give the search direction as where For simplicity, we refer to (10) and (11) as NHZ method hereafter. Further, in this paper, the function F is assumed to satisfy the following assumptions witch are often utilized in convergence analysis for nonlinear monotone equations (see, [37][38][39][40][41][42][43][44][45]48]).
The F is a monotone function: The F is Lipschitz continuous function, namely, there exists a L > 0 such that In what following, we will describe the proposed algorithm.
Step 3: Calculate the search steplength α k by (5). Let z k = x k + α k d k .
In what follow, we will show that the proposed NHZ derivative-free method enjoys the sufficient descent property which plays an important role in proof of convergence. From now on, we use F k to denote F(x k ). Theorem 1. The search direction d k generated by (10), (11) and (12) Proof. When k = 0, we have It is obvious that (15) is satisfied for k = 0. Now we will show that the sufficient descent condition (15) holds for k ≥ 1. We can obtain from (10) and (11) that By using the Equation (16) and the inequality u T Thus (15) holds for k ≥ 1.

Global Convergence Analysis
Now, we will investigate the global convergence of Algorithm 1. Firstly, we give the following Lemma which shows the line search strategy (5) is well-defined if the search directions {d k } satisfy the sufficient descent property. Lemma 1. If the iterative sequences {x k } and {z k } are generated by the Algorithm 1. Then, there always exists a steplength α k satisfying the line search (5).
Proof. Assume that, for any nonnegative integer i for βρ i , the line search strategy (5) does not hold in the k 0 -th iterate, then we have Letting i → ∞, we have from the continuity of F and ρ ∈ (0, 1) that which contradicts (15). The proof is completed.
The next Lemma indicates that the line search strategy (5) provides a lower bound for steplength α k .

Lemma 2.
If the iterative sequences {x k } and {z k } are generated by the Algorithm 1. Then, we can obtain that where δ = 1 − 1 4µ .
Proof. Tf α k = β, it is obviously that (19) holds. Suppose that α k = β, then we can obtain that α k = ρ −1 α k does not satisfy the line search process (5). That is, where z k = x k + α k d k .
From the sufficient descent condition (15), we have that From the Lipschitz continuity of F (14), (20) and (21), we can obtain that Therefore, the above inequalities and (21) imply This shows that Lemma about the search steplength α k holds.
The next Lemma is proved by Solodov and Svaiter (see Lemma 2.1 in [13]), which can also hold for Algorithm 1. Now, we give this lemma without proof, because its proof is similar in Solodov & Svaiter [13]. Lemma 3. Assume the function F is monotone and the Lipschitz continuous condition (14) holds. If the iterative sequences {x k } is generated by the Algorithm 1, then for any x * , such that F(x * ) = 0, we can obtain that In particular, the iterative sequence {x k } is bounded and Remark 1. The above Lemma 2 confirms that the sequence { x k − x * } decreases with k. In addition, (22) implies that lim Theorem 2. If the iterative sequences {x k } is generated by the Algorithm 1, then, we can have that Proof. We can obtain from (5) and (9) that, for any k, In particular, it follows from (23) and (25) that Lemma 4. If the iterative sequences {x k } is generated by the Algorithm 1, and x * satisfies F(x * ) = 0, z k = x k + α k d k , α k = ρ −1 α k . Then, { F(z k ) } and { F k } are bounded, i.e, there is a constant M ≥ 0, such that Proof. By the Lemma 2, we have From (26), we have that there is a constant M 1 > 0 such that α k d k ≤ M 1 . Hence Since the function F(x) is Lipschitz continuous, we can easily obtain the following two inequalities We can obtain (27).
In what following, we will give the global convergence theorem for our proposed method.

Theorem 3.
If the iterative sequences {x k } is generated by the Algorithm 1. We can obtain that Proof. We will prove this Theorem by contradiction. Assume that (29) is not true. Then it implies there is a constant ε > 0, s.t. F k > ε.
Since F k = 0, we have from (15) that d k = 0. Hence, the monotonicity of F together with (10) impliess This together with the definition of s k−1 implies We have from (11), (12) and (30) that Therefore, from (10) and (30), we can obtain Then, we can have d k ≤ C. It follows from Lemmas 2, Lemmas 3, F k ≥ ε and d k ≥ ε that for all k sufficiently large, It is obvious that the above inequality contradicts with (24). That is, (29) holds. And we complete this proof.

Numerical Experiments
Now, we will give some numerical experiments to test the numerical performance of our proposed method. We try to test the NHZ Algorithm 1 and compare it's performance with the spectral gradient (SG) method [15] and the MPRP method in [39]. In the testing experiments, all codes were written in Matlab R2018a, and run on a Lenovo PC with 4 GB RAM memory.
To obtain better numerical performance, we select the following initial steplength as in [39] and [43] We set ρ = 0.5, σ = 2. Further, let β = 1 if β < 10 −4 . Following the MPRP method in [39], we terminate the iterative process if the following condition is satisfied, and rtol = atol = 10 −4 . The numerical performance of SG, MPRP, and NHZ methods are tested by using the following five nonlinear monotone equations problem with different various sizes and initial points.

Problem 4 ([50]). The specific expression of the function F(x) is defined as
The specific expression of the function F(x) is defined as where |X| = (|x 1 |, |x 2 |, . . . , |x n |) T , B = (1, 1, . . . , 1) T , and The numerical results for 5 tested problems are reported in Tables 1-5  Tables 1-5 report the numerical results of the proposed algorithm, the spectral gradient (SG) method [15] and the MPRP method in [39] with five tested problems where the test indicators of Time, Iter and Feval are used to evaluate numerical performance.
Comparing the CPU time within the tested algorithms, we note that our proposed algorithm needs lower time consuming than both the spectral gradient (SG) method [15] and the MPRP method in [39] for all tested problems, and the difference is substantial and significant especially for large-scale problems. In addition, the MPRP method in [39] needs lower CPU time than the spectral gradient (SG) method [15]. Assessing the number of iterations within the tested algorithms, we find that the NHZ method requires fewer iterations than the spectral gradient (SG) method [15] and the MPRP method in [39] for all tested problems. We also note that our proposed algorithm requires fewer number of function evaluations for all tested problems, and the difference is substantial and significant.
In sum, from the numerical results in Tables 1-5, it isn't difficult to see that the proposed algorithm performs better than the spectral gradient (SG) method [15] and the MPRP method in [39] for three test indicators which implies that the modified Hestenes-Stiefel-based derivative-free method is computationally efficient for nonlinear monotone equations.

Conclusions
This paper aims to present a modified Hestenes-Stiefel method to solve the nonlinear monotone equations which combines the hyperplane projection method [13] and the modified Hestenes-Stiefel method in Dai and Wen [47]. In the proposed method, the search direction satisfies sufficient descent conditions. A new line search is proposed for the derivative-free method. Under appropriate conditions, the proposed method converges globally. The given numerical results show the presented method is more efficient compared to the methods proposed by the spectral gradient method Zhang and Zhou [15] and the MPRP method in Li and Li [39].
In addition, we also expect that our proposed method and its further modifications could produce new applications for problems in relevant areas of symmetric equations [51], image processing [52], and finance [53][54][55].