A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization

: In this paper, a three-parameter subspace conjugate gradient method is proposed for solving large-scale unconstrained optimization problems. By minimizing the quadratic approximate model of the objective function on a new special three-dimensional subspace, the embedded parameters are determined and the corresponding algorithm is obtained. The global convergence result of a given method for general nonlinear functions is established under mild assumptions. In numerical experiments, the proposed algorithm is compared with SMCG_NLS and SMCG_Conic, which shows that the given algorithm is robust and efﬁcient.


Introduction
The conjugate gradient method is one of the most important methods used for solving large-scale unconstrained problems, because of its simple structure, lower computation, storage, fast convergence, etc. The general unconstrained optimization problem is as follows: min where f : R n → R is continuously differentiable. The function value of f (x) at x k is denoted as f k , and its gradient is expressed as g k . Let α k be the step size; we have the following iteration form for conjugate gradient method: where d k is the search direction, which has the form of where β k ∈ R, referred to as the conjugate parameter. For different selections of β k , there are several well-known nonlinear conjugate gradient methods [1-6]: β HS k = g T k+1 y k d T k y k , β FR k = g k+1 2 g k 2 , β PRP k = g T k+1 y k g k 2 , where · represents the Euclidean norm, and y k = g k+1 − g k . The step size α k can be obtained in different ways. Zhang and Hager [7] proposed an effective non-monotone Wolfe line search as follows: The C k in the formula (4) is the convex combination of f 0 , f 1 , · · · , f k . 0 < δ < σ < 1, C 0 = f 0 , Q 0 = 1, C k+1 and Q k+1 are updated by the following rule: where 0 ≤ η min ≤ η k ≤ η max ≤ 1. The generalized non-monotone line search in [8] is composed of formula (5) and where 0 < δ < σ < 1, and For large-scaled optimization problems, some researchers have been looking for more efficient algorithms. In 1995, Yuan and Stoer [9] first proposed the method of embedding subspace technology into the conjugate gradient algorithm framework, namely the twodimensional subspace minimization conjugate gradient method (SMCG for short). The search direction is calculated by minimizing the quadratic approximation model on the two-dimensional subspace Ω k+1 = Span{g k+1 , s k }, namely where µ k and ν k are parameters, and s k = x k+1 − x k . Analogously, the calculation of the search direction is directly extended to Span{g k+1 , s k , s k−1 }. By this way, we can avoid solving the sub-problem in the total space, which can reduce the computation and storage cost immensely.
Inspired by SMCG, some researchers began to investigate the algorithm of the conjugate gradient method combined with subspace technology. Dai et al. [10] focused on the analysis of the subspace minimization conjugate gradient method proposed by Yuan and Stoer [9] and integrated SMCG with Barzilai-Borwein [11], a new Barzilai-Borwein conjugate gradient method (BBCG for short) was proposed. In the subspace Ω k+1 = Span{g k+1 , s k }, Li et al. [12] discussed the case where the search direction was generated by minimizing the conic model when the non-quadratic state of the objective function was stronger. Wang et al. [13] changed the conic model of [12] into the tensor model. Zhao et al. [14] discussed the case of regularization model. Andrei [15] further expanded the search direction, developed it into a three-dimensional subspace Ω k+1 = Span{−g k+1 , s k , y k }, and proposed a new SMCG method (TTS). Inspired by Andrei, Yang et al. [16] carried out a similar study. They applied the subspace minimization technique to another special three-dimensional subspace Ω k+1 = Span{g k+1 , s k , s k−1 }, and obtained a new SMCG method (STT). On the same subspace, Li et al. [17] further studied Yang's results, analyzed the more complex three parameters, and proposed a new subspace minimization conjugate gradient method (SMCG_NLS). Yao et al. [18] proposed a new three-dimensional subspace Ω k+1 = Span{g k+1 , s k , g k } and obtained the TCGS method by using the modified secant equation.
According to [19], it can be seen that the key to embedding subspace technology into the conjugate gradient method is to construct an appropriate subspace, select an approximate model, and estimate the terms of the Hessian matrix. The subspace Ω k+1 = Span{−g k+1 , s k , y k } contains the gradient of the two iteration points. If the difference between the two gradient points is too large, the direction of the gradient change y k and its value will be affected. We consider making appropriate corrections to the gradient changes of the two iteration points, and then combine the current iteration point gradient with the previous search direction to form a new three-dimensional subspace, and whether this subspace is valuable for research. This paper is taking its as the breakthrough point for research.
In this paper, to avoid the situation in which the changes of the gradient y k = g k+1 − g k dominate the subspace Span{g k+1 , d k , y k }, inspired by [20], we construct a similar subspace Ω k+1 = Span{g k+1 , s k , y * k } in which y * k = g k+1 − g k+1 g k g k , and then, by solving the optimal solution of an approximate model of the objective function in the given subspace to gain the corresponding parameters and algorithm. It can be shown that the obtained method is a global convergent and has nice numerical performance.
The rest of this paper is organized as follows: in Section 2, the search direction constructed on a new special three-dimensional subspace Ω k+1 is presented, and the estimations of matrix-vector production are given. In Section 3, The proposed algorithm and its properties under two necessary assumptions are described in detail. In Section 4, we establish the global convergence of our proposed algorithm under mild conditions. In Section 5, we compare the proposed method numerically with algorithms SMCG_NLS [17] and SMCG_Conic [12]. Finally, in Section 6, we conclude this paper and highlight future work.

Search Direction and Step Size
The main content of this section is to introduce four search direction models and the selections of initial step sizes on the newly spanned three-dimensional subspace.

Direction Choice Model
Inspired by [20], in this paper, the gradient change y k = g k+1 − g k is replaced by y * k = g k+1 − g k+1 g k g k . Then, the search directions are constructed in the three-dimensional subspace Ω k+1 = Span{g k+1 , s k , y * k }. From [10], we know that the approximate model used plays two roles: one is to approximate the original objective function in the subspace Ω k+1 ; the other one is to make the search direction d k+1 obtained by the approximate model descend, so that the original objective function declines along this direction. On our proposed subspace Ω k+1 = Span{g k+1 , s k , y * k }, we consider the approximate model of the objective function as where B k+1 is a symmetric positive definite approximation matrix of the Hessian matrix, satisfying B k+1 s k = y k . Obviously, there are three dimensions in the subspace Ω k+1 = Span{g k+1 , s k , y * k } that we are considering.
When the dimension is 3, it is easy to know that g k+1 , s k , y * k are not collinear. The form of the search direction is of the following form where µ k , ν k and γ k are undetermined parameters. Substituting (10) into (9) and simplifying, we have where Thus, (11) can be summarized as Under some mild conditions, we can prove that D k+1 is positive definite, which will be discussed in Lemma 1. When D k+1 is positive definite, by calculation and simplification, the only solution of (11) is where In order to avoid the matrix-vector multiplication, we need to estimate ρ k+1 , k and w k . Before estimating ρ k+1 , we first estimate k , w k .
For k , we get Based on the analysis of [9], cos 2 B 1 2 k+1 y * k , B 1 2 k+1 s k is desirable, which shows that k can have the following estimation: which also means that In order for the experiment to have a better numerical effect, we amended k as where Obviously, 0 > 0. Consider the following matrix, From (13) and 0 > 0, it can be known that the sub-matrix (15) is positive definite, inspired by the BBCG method [11], for w k , we take where where ζ 0 = 1.5, ζ k ∈ [1.2, 1.75). Now, we estimate ρ k+1 . Since D k+1 is positive definite, it is easy to know ∆ k+1 > 0, therefore Note that the right-hand side of (18) is n k . Combining with (12), (14) and (16), we have According to (14), we know that m k ≥ 1/2. In order to ensure that (18) holds, we estimate the parameter ρ k+1 by taking where , and ρ 0 ∈ (0, 1). Through the debugging of the algorithm, we find that the numerical experiment effect of ζ k using the adaptive value of (17) is better than that of ζ k with a fixed value, so ζ k also uses (17).
In summary, we find that when the following conditions are satisfied, the search direction d k+1 is calculated by (10) and (12).
In this case, the form of the search direction d k+1 is as follows: where µ k , ν k are undetermined parameters. Substituting (24) into (9), we get min (µ,ν) where ρ k+1 = g T k+1 B k+1 g k+1 , if it satisfies Then the unique solution of the problem (25) is Combined with BBCG, there is where ζ k in (27) is the same as (17).
Obviously, under certain conditions, the HS direction can be regarded as a special case of formula (24). Taking into account the finite termination of the HS method, in order to make our algorithm have good properties, when the following conditions hold, where ξ 4 ∈ [0, 1). We consider In summary, for the case where the dimension of the subspace is 2; if only condition (22) is true, d k+1 is calculated by (24). When inequalities (28) and (29) hold, d k+1 is calculated from (30).
According to the above analysis, d k+1 is calculated by (24) when only condition (22) is true for the case of the 2-subspace dimension. When conditions (28) and (29) hold, d k+1 is calculated by (30).
When all conditions (21), (22), and (23) are not valid, we choose the negative gradient as the search direction, i.e.,

Selection of Initial Step Size
Considering that the selections of initial step sizes will also have an impact on the algorithm, we choose the initial step size selection method of SMCG_NLS [17], which is also a subspace algorithm.
According to [21], we know that which indicates how close the objective function is to the quadratic function on the line segment formed between the current iteration point and the previous iteration point. Based on [22], we know that the following condition indicates that the objective function is close to a quadratic function: where 0 < λ 1 < λ 2 . Case I: when the search direction is calculated by Equations (10), or (24) or (30), the initial step size is α (0) k = 1. Case II: when d k+1 = −g k+1 , the initial step size is where α

The Obtained Algorithm and Descent Property
This section describes the obtained algorithm and its descending properties under two necessary assumptions in detail.

The Obtained Algorithm
The main content of this section is to introduce our proposed algorithm and give two necessary assumptions. Before introducing the algorithm, we first introduce the restart method we use to restart the proposed algorithm, and then describe the proposed algorithm in detail.

Descent Properties of Search Direction
In this subsection, we discuss the descent properties of the given algorithm, in which it will be proved that the proposed algorithm (TSCG) fulfills sufficient descent conditions in all cases. Now, we first introduce some common assumptions on the objective function.

Assumption 1.
The objective function f : R n → R is continuous differentiable and has a lower bound on R n .

Assumption 2. The gradient function g is Lipschitz continuous on the bounded level set
That is: y k ≤ L s k .

Proof of Lemma 2.
We only need to discuss the situation of search direction in relation to ρ k+1 .
Case I: if d k+1 is generated by (24), the proof is similar to [22], so the proof process is omitted.
Case II: when d k+1 is calculated by (10) and (12), we have where ϕ(x, y) is the binary quadratic function of variables g T k+1 y * k g k+1

From Lemma 1, it is easy to get
Therefore, we have Thus, that is the end of the proof.
(36) Proof of Lemma 3. As for the four forms of the constituent directions, we discuss them separately. Case I: if d k+1 = −g k+1 , let c 1 = 1 2 , then (36) holds. Case II: when the search direction is binomial, namely, we first discuss the situation given by (30). When d k+1 is determined by (30), combined with (28) and (29), for β k = β HS k , we have Case III: now, we discuss another case where the search direction is a binomial; that is, the situation where the search direction is generated by (24). Combining (22) and (27), obviously ρ k+1 ≤ ζ k g k+1 2 y k 2 s T k y k ≤ 2ξ 2 g k+1 2 .
In combination with the above formula and (35), it can be obtained Case IV: when the search direction is trinomial; that is, d k+1 is given by (10) and (12). Consider Lemma 2 and use (20), (21), and χ > 0, we first prove that ρ k+1 has an upper bound: The ρ k in the root of the above second inequality is k . Combining the conditions (22) and (23), it is easy to know that K 1 in (20) has an upper bound, namely There is Finally, according to Lemma 2, we have In summary, the value of c 1 , satisfying (36), is Then the proof is complete.
Case III: if d k+1 is calculated by (24) and (26), then, combining (22) and (27), and the Cauchy inequality, we deduce Combining the above equation, (27), the Cauchy inequality, and the triangle inequality, we can get 5s T k y k ξ 1 s k 2 y k 2 2 s k y k + ρ k+1 s k 2 g k+1 2 g k+1 ≤ 10s T k y k ξ 1 s k y k Case IV: when the search direction is three; that is, the search direction is calculated by (10) and (12). Similar to Case III, let us first derive the lower bound of k+1 . According to (16), (18) and (20), we have Let us write χ 1 = k (s T k y k ), and combine that with m k ≥ 1 2 , we have χ = χ 1 m k ≥ 1 2 χ 1 > 0, therefore (20), (22) and (23), through calculation and simplification, we obtain According to the above results, it can be further deduced According to the four cases of the above analyses, the value of c 2 that satisfies (38) is The proof is ended.

Convergence Analysis
The global convergence of the algorithm for general functions is proven in this section.
Theorem 1. Suppose Assumption 1 and Assumption 2 are satisfied, and sequence x k is generated by algorithm TSCG, then lim inf as SMCG_NLS. Figure 4 shows that TSCG and SMCG_NLS are better than SMCG_Conic in terms of the robustness and stability of running time. In general, TSCG is almost better than SMCG_NLS in terms of robustness and stability, only when factor 2.6328 ≤ τ ≤ 7.6472, TSCG is inferior to SMCG_NLS.

Conclusions and Prospect
In order to obtain a more efficient and robust conjugate gradient algorithm for solving the unconstrained optimization problem, by constructing a new three-dimensional subspace Ω k+1 = Span{g k+1 , s k , y * k }, and solving the sub-problem of a quadratic approximate model of the objective function in the given subspace, we obtained a new three-term conjugate gradient method (TSCG). The global convergence of the TSCG method for general functions is established under mild conditions. Numerical results show that TSCG has better performance than SMCG_NLS and SMCG_Conic, both of which are subspace algorithms, for a given test set.
As for the research of the subspace algorithm, the main content of our future work is to continue to study the estimation of terms, including Hessian matrix, to discuss whether it can be extended to the constrained optimization algorithm, and consider the application of the algorithm to image restoration, image segmentation, and path planning of engineering problems.