A New Hybrid Three-Term Conjugate Gradient Algorithm for Large-Scale Unconstrained Problems

Three-term conjugate gradient methods have attracted much attention for large-scale unconstrained problems in recent years, since they have attractive practical factors such as simple computation, low memory requirement, better descent property and strong global convergence property. In this paper, a hybrid three-term conjugate gradient algorithm is proposed and it owns a sufficient descent property, independent of any line search technique. Under some mild conditions, the proposed method is globally convergent for uniformly convex objective functions. Meanwhile, by using the modified secant equation, the proposed method is also global convergence without convexity assumption on the objective function. Numerical results also indicate that the proposed algorithm is more efficient and reliable than the other methods for the testing problems.


Introduction
In this paper, we consider the following unconstrained problem: where function f : R n → R is continuously differentiable and bounded below. There are many methods for solving (1). such as the Levenberg-Marquardt methods [1], Newton methods [2] and quasi-Newton methods [3,4]. However, these methods are efficient for small and medium-sized problems and are not suitable for large scale problems in terms of the storage of a matrix for second order information or its approximation. Conjugate gradient (CG) methods [4][5][6][7][8][9][10][11][12] are much more effective for unconstrained problems, especially for large-scale cases by low memory requirements and strong convergence properties [6,[8][9][10][11], etc. Meanwhile, CG methods have been applied to image restoration problems, optimal control problems and optimal problems in machine learning [13][14][15], etc. In this paper, we design a CG method for (1). The nonlinear CG method was first proposed by Hestenes and Stiefel [16] for linear equations Ax = b. In 1964, Fletcher and Reeves [17] extended the CG method in [16] to unconstrained optimization problems. After that, many researchers proposed various CG methods [6][7][8][9][10]12,18]. In CG methods, a sequence of iterative point {x k } is generated by an initial point x 0 and: where α k is the step size which is determined by some line search technique and d k is the search direction. In a traditional CG method, the direction is usually defined by Different conjugate parameter β k generates a different CG method which may be significantly different in theoretical properties and numerical performance. The Hestenes-Stiefel (HS) method [16] and Polak-Ribière-Polyak (PRP) method [19,20] have nice numerical performance and their conjugate parameters are: where g k = ∇ f (x k ), g k+1 = ∇ f (x k+1 ) and y k = g k+1 − g k . Note that the HS method automatically satisfies the conjugate condition d T k+1 y k = 0, independently of any line search technique. Dai and Liao [18] extended the above conjugate condition to: where t ≥ 0 and s k = x k+1 − x k . The new condition (4) gives a more accurate approximation for the Hessian matrix of the original objective function. Based on the condition (4), Dai and Liao [18] presented a new conjugate parameter: In order to have global convergence, they selected the non-negative conjugate parameter, meaning that: β DL+ k+1 (t) = max{0, β DL k+1 (t)}. Under some mild conditions, the global convergence was established. However, the selection of the parameter t strongly affects the numerical performance, thus many scholars have focused on the choices for the parameter t, as can be seen in [21][22][23][24][25][26] etc.
Compared with the traditional two-term CG method, three-term CG methods [27][28][29][30][31] always have good numerical performance and nice theoretical properties, such as the sufficient descent property, independently of the accuracy of line search, i.e., it is always holds that: where c > 0. Specifically, Zhang et al. [29] proposed a descent three-term PRP CG method in which the direction has the form: and Zhang et al. [28] presented a descent three-term HS CG method in which the direction is: and Babaie-Kafaki and Ghanbari [31] gave a modified three-term HS/DL method in which the direction owns the form: For the above three directions, the sufficient descent property is always satisfied, i.e., g T k d k ≤ − g k 2 . Note that the sufficient descent property is stronger that the descent property and may greatly improve the numerical performance of the corresponding methods.
Motivated by the above discussions, in this paper, we propose a new descent hybrid three-term CG algorithm. Under some mild conditions, the direction in this descent hybrid three-term CG algorithm may reduce the directions in [28,29,31], respectively, and another new three-term direction which also satisfies the sufficient descent property with c = 1 (which is why we call our method as the hybrid three-term CG method). The new method owns the sufficient descent property independent of the accuracy of the line search technique. Under some mild conditions, the global convergence is established for uniformly convex objective functions. For general functions without convexity assumption, the global convergence is also established by using the modified secant condition in [32].
Numerical results indicate that the proposed algorithm is effective and reliable.
The paper is organized as follows. In Section 2, we firstly present the motivation for the hybrid three-term CG method and then propose the new hybrid three-term direction and prove some properties of the new direction and give the global convergence for uniformly convex objective functions at last. In Section 3, the global convergence for the general nonlinear functions is established with the help of the modified secant condition. Numerical tests are given in Section 4 to show the efficient and reliable nature of the proposed algorithm. Finally, the conclusions are presented in Section 5.

Motivation and Algorithm
In this section, we firstly present the motivation and give the form of the new direction. It should be noted that if the exact line search technique is adopted, which implies g T k+1 d k = 0, then it holds that: If the inexact line search technique is adopted, these three methods may be different in theoretical property and numerical performance and the HS method and DL method may be not well defined (the denominator y T k d k may be 0). Zhang [33] present a hybrid conjugate parameter β hybrid k+1 for the traditional two-term Dai-Liao CG method: The numerical results for general nonlinear equations show that the hybrid two-term conjugate residual method is effective and reliable.
Motivated by the above discussions and the nice properties of three-term CG methods, in the following, we propose a new hybrid descent three-term direction which has the following form: d 0 = −g 0 and: where: Note that the direction d N k+1 is well defined. In fact, if y T k s k = g k 2 = 0 holds, the condition g k 2 = 0 indicates that the method stops and the optimal solution (x k ) is obtained.
In the following, we give some remarks for the above direction. Note that if t = 0 and y T k s k ≥ g k 2 hold, the direction d N k+1 reduces the direction d TTHS k+1 in [28], and t = 0 and y T k s k ≤ g k 2 hold, the direction d N k+1 reduces to the direction d TTPRP k+1 in [29]. Note also that if y T k s k ≥ g k 2 holds, then the parameter β N k+1 reduces to the conjugate parameter β DL k+1 and the direction d N k+1 reduces to a modified vision of the direction d TTDL k+1 in [31]. If y T k s k < g k 2 holds, the direction d N k+1 reduces to a new three-term direction which also satisfies the sufficient descent property with c = 1. Overall, we regard the direction d N k+1 as the hybrid direction of the HS direction, the Dai-Liao direction and the PRP direction.

Algorithm for Uniformly Convex Functions
Now, based on the above analyses, we state the steps of our algorithm as follows: Algorithm 1: New hybrid three-term conjugate gradient method (HTTCG). 1 Step 0. Select the initial point x 0 ∈ R n . Compute g(x 0 ) and set d 0 = −g 0 .
Let k := 0. 2 Step 1. If g k ≤ ε, then stop, otherwise go to next step; 3 Step 2. Compute the step size α k along the direction d k by the line search strategy technique; 4 Step 3. Let x k+1 = x k + α k d k ; 5 Step 4. Compute the search direction d k+1 by (9); 6 Step 5. Set k := k + 1 and go to Step 1.

Remark 1.
Note that in Algorithm 1, the line search technique is not explicitly given. In fact, any line search technique is accepted.
In the following, we show that Algorithm 1 owns the sufficient descent property independent of any line search technique.

Lemma 1.
For any line search technique, the sequence {d N k } is generated by Algorithm 1, and it always holds that: Proof. If k = 0, we have d 0 = −g 0 , then it holds g T For k ≥ 0, by the definition of d N k+1 , we have the following inequality: where the last inequality holds by t ≥ 0. Then, (10) holds. This completes the proof.
Lemma 1 means that the new direction satisfies the sufficient descent property independent of the line search technique. A conjugate condition also plays an important role in numerical performance. In the HS method, it automatically satisfies the condition d T k+1 y k = 0; in Dai-Liao method, the modified condition d T k+1 y k = −tg T k+1 s k is always satisfied. In our part, by the design of the direction d N k+1 , we have: From (12), it holds that the new direction d N k+1 also satisfies the DL conjugate condition (4) in an extent form, i.e., (d N k+1 ) T y k ≤ −t 1 g T k+1 s k where t 1 = ty T k s k + y k 2 max{y T k s k , g k fact, if we adopt the line search technique which results in y T k s k ≥ 0, then it holds that

Convergence for Uniformly Convex Functions
In the following, we present the global convergence analysis of the HTTCG method under the following assumptions.
Assumption 2. In some neighborhood N of T, the gradient of function f (x), g(x), is Lipschitz continuous, which means there exists a constant L > 0 such that: Note that based on Assumptions 1 and 2, there exists a positive constant G such that: In the following, we show that the sequence {d N k } generated by Algorithm 1 is bounded.

Lemma 2.
Assume 0 < t ≤ T and Assumptions 1 and 2 hold. For any line search technique, the sequence {d N k } is generated by Algorithm 1. If the objective function f is uniformly convex on the set N, then {d N k } is bounded.
Proof. Since function f is uniformly convex on the set N, then for any x, y ∈ N, we have where µ > 0 is the uniformly convex parameter. Especially, if we take x = x k+1 and y = x k , then it holds that: In the following, we prove the boundedness of parameters β N k+1 and δ N k+1 . In fact, by their definitions, we have: By the definition of d N k+1 , we have: where the last inequality holds by (15). This completes the proof.
The following lemma plays an essential role for the global convergence theorem of our method. It can be seen in Lemma 3.1 in [34]. Hence, we only state it here and omit its proof.
Lemma 3. Suppose that Assumptions 1 and 2 hold. Consider any iterative method with the form (2) where d k satisfies the sufficient descent property and α k is computed by Wolfe line search technique (20). If the following relationship holds: then the method globally converges in the sense that: Now, we establish the global convergence of Algorithm 1 for the uniformly convex objective functions. Theorem 1. Suppose that Assumptions 1 and 2 hold. Consider Algorithm 1 in which step size α k is computed by the line search technique (20). If the objective function f is uniformly convex on set N, then Algorithm 1 globally converges in the sense that: Proof. From Lemma 1, we have that the direction d N k+1 satisfies the sufficient descent property with c = 1. By the first inequality in the line search technique (20), we have the sequence { f (x k )} k≥0 is non-increasing and {x k } k≥0 ⊆ N. By the boundedness of {d N k } in Lemma 2, we have that (21) holds. Then, (22) holds. Since f is uniformly convex, then we have (23). This completes the proof.

Convergence for General Nonlinear Functions
In order to achieve the global convergence without convexity assumption for the general function, we adopt the modified secant condition in [32] (similar modified secant conditions can also be founded in [35,36] etc.). Concretely, the modified secant condition is: where p > 0 and C > 0 and: Based on the modified secant condition, we present the direction: d 0 = −g 0 and: where: . Now, based on the above discussions, we state our algorithm as follows: Algorithm 2: Hybrid three-term CG method using modified secant condition (HTTCGSC). 1 Step 0. Select x 0 ∈ R n , constants C > 0 and r > 0. Compute g(x 0 ) and set d 0 = −g 0 . Let k := 0. 2 Step 1. If g k ≤ ε, then stop, otherwise go to next step; 3 Step 2. Compute the step size α k along the direction d k by the line search technique; 4 Step 3. Let x k+1 = x k + α k d k ; 5 Step 4. Compute the search direction d k+1 by (25); 6 Step 5. Set k := k + 1 and go to Step 1.
Note also that in Algorithm 2, the line search technique is not explicitly given. Similar to Lemma 1, we also have the following lemma. Here, we omit its proof.

Lemma 4.
For any line search technique, the sequence {d NN k } is generated by Algorithm 2, then it always holds that: and: z k ≤ (2L + CG p ) s k .
Proof. In fact, for any line search technique, we consider two cases: Case (i) y T k s k ≥ 0 holds. In this case, we have h k = C and z k = y k + C g k p s k . Then, we have: z T k s k = y T k s k + C g k p s k 2 ≥ C g k p s k 2 .
Case (ii) y T k s k < 0 holds. In this case, it holds that h k = C − y T k s k s k 2 g k −p and z k = y k + C g k p s k − y T k s k s k 2 s k . Then, we have: Based on the above discussions, we have that for any line search technique, (27) always holds.
By the definition of z k , we have: where the last inequality holds by g k ≤ G. Then, (28) holds. This completes the proof.
In the following, we assume that the limit (22) does not hold, otherwise Algorithm 2 converges. This means that there exists a positive constant η such that: Now, we give the global convergence of Algorithm 2 for general nonlinear problems without convex assumption. Theorem 2. Suppose that Assumptions 1 and 2 hold. Consider Algorithm 2 in which step size α k is computed by line search technique (20). Then, Algorithm 2 globally converges in the sense that (22) holds.

Proof.
We proceed by contradiction. Suppose that (22) is not true. Then, the inequality (31) holds.
In the following, we firstly prove that the sequence {d NN k } k≥0 is bounded. Similarly, due to the analyses in (17) and (18), we have that: where the third inequality holds by Assumption 1 and (28), the last inequality holds by g k ≥ η.

Numerical Results
In this section, we firstly present the numerical performance of Algorithm 2 and compare it with other methods in [28,31]. Then, an accelerated technique is applied in our method.

Numerical Performance of Algorithm 2
In this subsection, we focus on the numerical performance of Algorithm 2 and compare it with the MTTDLCG method in [31] and the MTTHSCG method in [28]. For the MTTDLCG method and the MTTHSCG method, we take their settings for parameters. For the value of parameter t, authors in [37] point out that t = y k 2 y T k s k is a good choice for the Dai-Liao method and the authors in [18] suggest t = 0.1 is a good choice. In this paper, we take t = max{0.1, z k 2 max{z T k s k , g k 2 } }. We execute the tests on a personal computer with the Windows 10 operating system, AMD CPU with @2.1 GHz, 16.00 GB of RAM. Meanwhile, the corresponding codes are written in MATLAB R2016b. The parameters referred are that: σ 1 = 0.20, σ 2 = 0.85, C = 0.1, r = 1 if s k 2 < 1, r = 3 otherwise. In the following, we present the stopping rules: Stopping rules (Himmeblau rule): During the testing, Testing stops if g(x) < ε or Tem< ε 2 holds.
The values of these parameters are ε 1 = ε 2 = 10 −5 and = 10 −6 . Meanwhile, the testing also stops if the total iteration number is larger than 10,000. For the step size, α k will be chosen when the search number of the WWP line search is more than 6. Testing problems with initial points considered here are from [38], which are listed in Table 1. Meanwhile, for each problem, ten large-scale dimensions with 1500, 3000, 6000, 7500, 9000, 15,000, 30,000, 60,000, 75,000 and 90,000 variables are considered. To approximately assess the corresponding numerical performances, the performance profile introduced by Dolan and Moré in [39] is adopted. That is, for each method, we plot the fraction P of the testing problems for which the method is within a factor τ of the best time. The left side of the figure presents the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that are successfully solved by each of the methods. Figure 1 presents the performance profile of the three methods in iterations.
From Figure 1, we have that Algorithm 2 (the HTTCG method) in 53%, the MTTDLCG method in 35% and the MTTHSCG method in 41% solve the test problems with the least iteration number. This indicates that Algorithm 2 performs best. Figure 2 presents the performance profile of the three methods in the function-gradient number case: From Figure 2, we have that Algorithm 2 at 69%, the MTTDLCG method at 14% and the MTTHSCG method at 22% solve the test problems with the least number of computing functions and gradients. This also indicates that Algorithm 2 performs best. Figure 3 presents the performance profile of the three methods in CPU time consumed.
From Figure 3, we have that Algorithm 2 at 12%, the MTTDLCG method at 31% and the MTTHSCG method at 46% solve the test problems with the least CPU time consumed. This indicates that the MTTHSCG method performs best in terms of CPU time consumed. From  Figures 1-3, we have that Algorithm 2 is effective and comparable with the MTTDLCG method and the MTTHSCG method for the test problems in Table 1.

Accelerated Strategy for Algorithm 2
In order to improve the numerical performances of Algorithm 2, in this subsection, we utilize the acceleration strategy in [40], which modifies the step in a multiplicative manner along iterations. Concretely, the iterative form (2) reduces to: where θ k = α k g T k d k and We also test the problems in Table 1 and compare Algorithm 2 with the acceleration strategy with Algorithm 2. The corresponding parameters remain unchanged. The performance profiles can be founded in Figures 4-6.   From Figure 4, we find that the Algorithm 2 with the acceleration strategy (HTTCG-A method) in 73% and HTTCG method in 36% solve the testing problems with the least iteration number. From Figure 5, we see that the HTTCG-A method in 72% and HTTCG method in 35% solve the testing problems with the least number of computing functions and gradient. From Figure 6, we see that the HTTCG-A method in 59% and HTTCG method in 28% solve the testing problems with the least CPU time consumed. These all indicate that the acceleration strategy works and can reduce the number of iterations, the number of computing functions and gradients and the time consumed.

Conclusions
Unconstrained smooth optimization problems can be found in many problems such as optimal control problems and machine learning problems, etc. In this paper, a hybrid threeterm descent conjugate gradient algorithm is proposed. This hybrid three-term conjugate gradient algorithm owns the sufficient descent property independent of any line search technique. Meanwhile, it also satisfies the extent Dai-Liao conjugate condition. Under some mild conditions, this algorithm is globally convergent for the uniformly convex functions. For general nonlinear function, the hybrid method is also globally convergent by using some modified secant conditions. Numerical results indicate that the hybrid method is effective and reliable. Meanwhile, an acceleration strategy is adopted to improve the numerical performances. In the future, we will apply our conjugate gradient methods to some non-smooth problems by smoothing strategy and Moreau-Yosida regularization technique and to image restoration problems.