A Modified Sufficient Descent Polak – Ribiére – Polyak Type Conjugate Gradient Method for Unconstrained Optimization Problems

In this paper, a modification to the Polak–Ribiére–Polyak (PRP) nonlinear conjugate gradient method is presented. The proposed method always generates a sufficient descent direction independent of the accuracy of the line search and the convexity of the objective function. Under appropriate conditions, the modified method is proved to possess global convergence under the Wolfe or Armijo-type line search. Moreover, the proposed methodology is adopted in the Hestenes–Stiefel (HS) and Liu–Storey (LS) methods. Extensive preliminary numerical experiments are used to illustrate the efficiency of the proposed method.


Introduction
Conjugate gradient methods are among the most popular methods for solving optimization problem, especially for large-scale problems due to the simplicity and low storage of their iterative form [1].
Consider the following unconstrained optimization problem: where f : R n → R is continuously differentiable.Let x 0 be any initial point of the solution of the problem (1), then the conjugate gradient method generates an iteration sequence as follows: where x k is the kth iterative point, α k > 0 is a steplength which is obtained by carrying out some line search, and d k is a search direction defined by where g k = g(x k ) denotes the gradient of the function f (x) at x k and β k is a scalar that determines different conjugate gradient methods [2][3][4][5][6].In this paper, we focus our attention on well-known methods such as Polak-Ribière-Polyak (PRP) [4,5], Hestenes-Stiefel (HS) [3] and Liu-Storey (LS) [6] methods which share the same numerator g T k y k−1 in β k .The update parameters of these methods are, respectively, given by where • denotes the Euclidean norm of vectors.Other nonlinear conjugate gradient methods and their global convergence can be found in [1,7].
It is well-known that the PRP, HS and LS methods are generally regarded to be the most efficient methods in practical computation.This can be attributed to the property (*), which was derived in Gilbert and Nocedal [8].Polak-Ribière [4] obtained the global convergence of the PRP method for the strongly convex functions with exact line search.Yuan [9] also obtained the global convergence of the PRP method under the assumption that the search direction satisfies a descent condition: and the following standard Wolfe line search where 0 < δ ≤ σ < 1.Their convergence properties are not so good in many situations.Powell [10] gave a counter example which showed that there exist nonconvex functions on which the PRP method does not converge globally even if the exact line search is used.Inspired by Powell's work, Gilbert and Nocedal [8] proved that the modified PRP method is globally convergent in which β k is given by . The search direction prevents effectively jamming phenomena from occurring and satisfies the descent property (5) or the following sufficient descent condition: which is very important for establishing the global convergence of the proposed method.In [11], Hager and Zhang proposed a modified HS formula for β k defined by More specifically, in their proposed method, called CG-DESCENT.They showed that the method possesses the sufficient descent property with c = 7/8.Afterwards, they also presented the following extension of β HZ k : , where θ k is a nonnegative parameter.If θ k ≥ θ > 1/4, then the method possesses the sufficient descent property with c = 1 − 1/(4θ).Cheng [12] developed a two term PRP-based descent method satisfying (7) by use of the projection technique for unconstrained optimization problem.Yu et al. [13] proposed a modified form of β PRP k as follows: It is important that if µ > 1 4 , then the condition ( 7) is achieved with c = 1 − 1 4µ .Yuan [14] present a new PRP formula defined by where µ > 1 4 guaranteeing the descent property (7) and β MPRP k ≥ 0. Livieris and Pintelas [15] proposed a new class of spectral conjugate gradient methods which ensures sufficient descent independent of the accuracy of the line search.
Wei et al. [16] gave a variant of the PRP method called the VPRP method.The parameter β k in the VPRP method is given by Based on the VPRP method, Zhang [17] made a little modification and obtained the NPRP method as follows, and established the sufficient descent property (7) of the NPRP method.Recently, Zhang [18] proposed a three-term conjugate gradient method called MPRP method in which the direction d k takes the following form: leading to the MPRP method with the sufficient descent property.This property always holds independent of any line search and the convexity of the objective function.Under the following line search where α k = max{ρ i , i = 1, 2, ...} with 0 < ρ, δ < 1, the global convergence of the MPRP method is established.Note that the MPRP method in [18] will reduce to the standard PRP method if exact line search is used and converges globally under the line search (9).However, it fails to converge under the weak Wolfe line search (6).The main reason lies in the trust region property (Lemma 1 in Section 2) that is not satisfied by the MPRP method.Based on the method in [12,18], Dong et al. [19] propose a three-term PRP-type conjugate gradient method which always satisfies the sufficient descent condition independently of line searches employed.Motivated by the above observations, we propose a modified three-term PRP formula based on (8), which possesses not only the sufficient descent property but also the trust region feature.In the following, we first reformulate the search direction (8) into a new form, which can be written as follows: Then, we can consider the following general iteration form: with any β k , it is not difficult to deduce that the direction defined by (11) satisfies which is independent of any line search and the convexity of the objective function.
In this paper, we further study the PRP method and suggest a new three term PRP method to improve the numerical performance and obtain better property of the PRP method.The remaining of this paper is organized as follows.In Section 2, we present a modified PRP method by using a new technique and establish its global convergence.In Section 3, the new technique is extended to the HS and LS method.In the last section, some numerical results are reported to show the modified methods are efficient.

The Modified PRP Method and Its Properties
In order to have the sufficient descent condition and keep simple structure and good properties, we take a modification to the denominator of the PRP formula, namely, where µ > 0. For convenience, we call the iterative form by ( 2), ( 11) and ( 13) a ZPRP method.It is obvious that the ZPRP method reduces to the PRP method if Then, we give the modified PRP type conjugate gradient method below (Algorithm 1).
Step 1: If g k ≤ , then stop.Otherwise, go to Step 2.
Step 2: Find the step size α k satisfying a suitable line search.
The following lemma shows that the direction d k determined by (11) satisfies a trust region property.Lemma 1.Let d k be defined by (11) with β ZPRP k , then we have Proof of Lemma 1.By ( 13), for all k ≥ 1, we have From ( 11), ( 13) and ( 15), we obtain The proof is completed.

Global Convergence of the ZPRP Method
In this section, we come to show the global convergence of our proposed method.The following assumptions are often used in the literature to analyze the global convergence of conjugate gradient methods with inexact line searches.Assumption 1 In some neighborhood N of Ω, f is continuously differentiable and its gradient is Lipschitz continuous, that is, there exists a constant L > 0 such that g(x) − g(y) ≤ L x − y , ∀x, y ∈ N.
We first prove the ZPRP method is globally convergent with Wolfe line search (6).Under Assumption 1, we give a useful Zoutendijk condition [20].
Lemma 2. Suppose that Assumption 1 holds.Consider the method in the form of ( 2) and (3) where d k is a descent direction and α k satisfies the Wolfe line search conditions (6).Then we have Obviously, the Zoutendijk condition ( 17) and ( 12) imply that Theorem 1. Suppose that Assumption 1 holds.Consider the ZPRP method, and α k is obtained by the Wolfe conditions ( 6).Then we have lim Proof of Theorem 1.By Lemma 1, we have which implies Hence, (19) holds.The proof is completed.
Next, we prove the global convergence of the ZPRP method under the condition (9).
Theorem 2. Suppose that Assumption 1 holds.Consider the ZPRP method and α k satisfies the Armijo line search (9).Then we have lim inf Proof of Theorem 2. Suppose that the conclusion is not true.Then there exists a constant > 0 such that ∀k ≥ 0, From (9) and Assumption 1 (i), we have If lim inf k→∞ α k > 0, we get from (24) that lim inf k→∞ d k = 0. From ( 12), we get lim inf k→∞ g k = 0, which contradicts (23).Suppose lim inf k→∞ α k = 0, then there is an infinite index set K such that lim k∈K,k→∞ From ( 9), it follows that when k ∈ K is sufficiently large, ρ −1 α k satisfies the following inequality, By Assumption 1 (ii) and the mean value theorem, there is a η k ∈ (0, 1) such that By ( 20), ( 26) and ( 27) , we can get that Together with (25), (28) implies lim k∈K,k→∞ g k = 0.This also yields a contradiction.The proof is completed.

Extension to the HS and LS Method
In this section, we extend the idea above to the HS and LS method.The corresponding method is called the ZHS method and the ZLS method in which β k is respectively defined by where µ > 0. It is obvious that Hence, we now only need to discuss the global convergence of the ZHS method.
The following theorem shows that the ZHS method converges globally with the Wolfe line search (6).Theorem 3. Let Assumption 1 hold.Consider the ZHS method and α k is obtained by the Wolfe line search (6), then lim Proof of Theorem 3. Suppose by contradiction that the conclusion is not true.Then there exists a constant > 0 such that g k > , ∀k ≥ 1.From (29), it follows that By (11) with , we can get that Hence, combing with (17), which leads to a contradiction.The proof is completed.
The following result shows that the ZHS method with the Armijo line search (9) possesses global convergence.Theorem 4. Let Assumption 1 hold.Consider the ZHS method and α k is obtained by the line search (9), then Proof of Theorem 4. The proof is similar to the proof of the global convergent property of the ZPRP method given in Theorem 2 in this paper.We omit it here.

Numerical Experiments
In this section, we report some numerical results on some of the unconstrained optimization problems in the CUTE [21] test problem libraty.We test the ZPRP method and ZHS method, and compare the performance of the these two methods with the MPRP method in [18].The parameters δ = 10 −4 , ρ = 0.3 and µ = 0.001.All codes were written in MATLAB R2012a and run on PC with 3.00 GHz CPU processor and Win 7 operation system.We use the stopping iteration g k ∞ ≤ 10 −6 .The detailed numerical results are listed on the web site: http://mathxiuxiu.blog.sohu.com/326066259.html.
We first evaluate the performance of the ZPRP method with that of CG-DESCENT proposed by Hager and Zhang (2005) and all methods with the Wolfe line search (6).Figures 1-3 show the numerical performance of the above methods related to the total number of iterations, the number of function and gradient evaluations, CPU time, respectively, which are evaluated using the profiles of Dolan and Moré [22].For each method, we plot the fraction P of problems for which the method is within a factor t of the smallest number of iterations, or the smallest number of function evaluations or least CPU time, respectively.The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods.The top curve is the method that solved the most problems in a time that was within a factor t of the best time.Clearly, the ZPRP method has the better performance since it illustrates the best probability of being the optimal solver, outperforming CG-DESCENT.From Figure 1, we can obtain the ZPRP method solves about 59.5% of the test problems with the least number of function evaluations while CG-DESCENT solve about 56.5% of the test problems.For the total number of function and gradient evaluations, in Figure 2 illustrates that the ZPRP method solves 55.2% of the test problems with the least number of function and gradient evaluations while CG-DESCENT solve about 52.2% of the test problems.Therefore, the ZPRP method outperforms CG-DESCENT.In the sequel, we compare the performance of the ZPRP method with that of the ZHS method and the MPRP method in [18] and all methods with the line search (9).Figures 4-6 list the performance of the above methods relative to CPU time, the number of function evaluations and the number of gradient evaluations, respectively.From Figure 4, we can observe that the ZPRP method outperforms the MPRP and ZHS method.More analytically, the performance profile for the number of iteration shows that ZPRP can solve 61% of the test problems with the least number while MPRP and ZHS solve about 47.5% and 45.2% of the test problems, respectively.As regards the number of function and gradient evaluations, Figure 5 shows that the ZPRP solves 80% of the test problems with the least number.Hence, the performance of the ZPRP method slightly better than that of the MPRP and ZHS methods.

Conclusions
In this paper, we first proposed a modified PRP formula which provides sufficient descent directions for the objective function independent of any line search.Then we applied the technique to HS and LS conjugate gradient methods which also ensure the sufficient descent property.The global convergence of modified methods are established under the standard Wolfe line search or Armijo line search.Moveover, numerical experiments show that the proposed methods are promising.Our future work is concentrated on applying our coefficient β k with spectral conjugate gradient method [15] which ensures sufficient descent independent of the accuracy of the line search and studying the convergence properties of a spectral conjugate gradient method.

Figure 1 .
Figure 1.The number of iteration.

Figure 2 .
Figure 2. The total number of function and gradient evaluations.

Figure 3 .
Figure 3.The total CPU time.

Figure 4 .
Figure 4.The number of iteration.

Figure 5 .
Figure 5.The total number of function and gradient evaluations.

Figure 6 .
Figure 6.The total CPU time.