Next Article in Journal
CardioVR-ReTone—Robotic Exoskeleton for Upper Limb Rehabilitation following Open Heart Surgery: Design, Modelling, and Control
Previous Article in Journal
Lie Group Classification of Generalized Variable Coefficient Korteweg-de Vries Equation with Dual Power-Law Nonlinearities with Linear Damping and Dispersion in Quantum Field Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization

1
Guangxi (ASEAN) Financial Research Center, Guangxi University of Finance and Economics, Nanning 530007, China
2
School of Mathematics and Information Science, Guangxi University, Nanning 530004, China
3
Guangxi Key Laboratory Cultivation Base of Cross-Border E-Commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530007, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2022, 14(1), 80; https://doi.org/10.3390/sym14010080
Submission received: 19 November 2021 / Revised: 11 December 2021 / Accepted: 16 December 2021 / Published: 5 January 2022
(This article belongs to the Special Issue Nonlinear PDEs and Symmetry)

Abstract

:
In this paper, a three-parameter subspace conjugate gradient method is proposed for solving large-scale unconstrained optimization problems. By minimizing the quadratic approximate model of the objective function on a new special three-dimensional subspace, the embedded parameters are determined and the corresponding algorithm is obtained. The global convergence result of a given method for general nonlinear functions is established under mild assumptions. In numerical experiments, the proposed algorithm is compared with SMCG_NLS and SMCG_Conic, which shows that the given algorithm is robust and efficient.

1. Introduction

The conjugate gradient method is one of the most important methods used for solving large-scale unconstrained problems, because of its simple structure, lower computation, storage, fast convergence, etc. The general unconstrained optimization problem is as follows:
min x R n f ( x ) ,
where f : R n R is continuously differentiable. The function value of f ( x ) at x k is denoted as f k , and its gradient is expressed as g k . Let α k be the step size; we have the following iteration form for conjugate gradient method:
x k + 1 = x k + α k d k , k 0 ,
where d k is the search direction, which has the form of
d k + 1 = g k + 1 , if k = 0 , g k + 1 + β k d k , if k 1 ,
where β k R , referred to as the conjugate parameter. For different selections of β k , there are several well-known nonlinear conjugate gradient methods [1,2,3,4,5,6]:
β k H S = g k + 1 T y k d k T y k , β k F R = g k + 1 2 g k 2 , β k P R P = g k + 1 T y k g k 2 ,
β k D Y = g k + 1 2 d k T g k , β k L S = g k + 1 T y k d k T g k , β k C D = g k + 1 2 d k T g k ,
where · represents the Euclidean norm, and y k = g k + 1 g k .
The step size α k can be obtained in different ways. Zhang and Hager [7] proposed an effective non-monotone Wolfe line search as follows:
f ( x k + α k d k ) C k + δ α k g k T d k ,
g k + 1 T d k σ g k T d k .
The C k in the Formula (4) is the convex combination of f 0 , f 1 , , f k . 0 < δ < σ < 1 , C 0 = f 0 , Q 0 = 1 , C k + 1 and Q k + 1 are updated by the following rule:
C k + 1 = η k Q k C k + f k + 1 Q k + 1 , Q k + 1 = η k Q k + 1 .
where 0 η m i n η k η m a x 1 .
The generalized non-monotone line search in [8] is composed of Formula (5) and
f ( x k + α k d k ) f ( x k + α k d k ) + η k + δ α k g k T d k ,
where 0 < δ < σ < 1 , and
η k = 0 , if k = 0 , min 1 ( k lg ( k / n + 12 ) , C k f ( x k ) , if k 1 .
For large-scaled optimization problems, some researchers have been looking for more efficient algorithms. In 1995, Yuan and Stoer [9] first proposed the method of embedding subspace technology into the conjugate gradient algorithm framework, namely the two-dimensional subspace minimization conjugate gradient method (SMCG for short). The search direction is calculated by minimizing the quadratic approximation model on the two-dimensional subspace Ω k + 1 = S p a n { g k + 1 , s k } , namely
d k + 1 = μ k g k + 1 + ν k s k ,
where μ k and ν k are parameters, and s k = x k + 1 x k . Analogously, the calculation of the search direction is directly extended to S p a n { g k + 1 , s k , s k 1 } . By this way, we can avoid solving the sub-problem in the total space, which can reduce the computation and storage cost immensely.
Inspired by SMCG, some researchers began to investigate the algorithm of the conjugate gradient method combined with subspace technology. Dai et al. [10] focused on the analysis of the subspace minimization conjugate gradient method proposed by Yuan and Stoer  [9] and integrated SMCG with Barzilai–Borwein [11], a new Barzilai–Borwein conjugate gradient method (BBCG for short) was proposed. In the subspace Ω k + 1 = S p a n { g k + 1 , s k } , Li et al. [12] discussed the case where the search direction was generated by minimizing the conic model when the non-quadratic state of the objective function was stronger. Wang et al. [13] changed the conic model of [12] into the tensor model. Zhao et al. [14] discussed the case of regularization model. Andrei [15] further expanded the search direction, developed it into a three-dimensional subspace Ω k + 1 = S p a n { g k + 1 , s k , y k } , and proposed a new SMCG method (TTS). Inspired by Andrei, Yang et al. [16] carried out a similar study. They applied the subspace minimization technique to another special three-dimensional subspace Ω k + 1 = S p a n { g k + 1 , s k , s k 1 } , and obtained a new SMCG method (STT). On the same subspace, Li et al. [17] further studied Yang’s results, analyzed the more complex three parameters, and proposed a new subspace minimization conjugate gradient method (SMCG_NLS). Yao et al. [18] proposed a new three-dimensional subspace Ω k + 1 = S p a n { g k + 1 , s k , g k } and obtained the TCGS method by using the modified secant equation.
According to [19], it can be seen that the key to embedding subspace technology into the conjugate gradient method is to construct an appropriate subspace, select an approximate model, and estimate the terms of the Hessian matrix. The subspace Ω k + 1 = S p a n { g k + 1 , s k , y k } contains the gradient of the two iteration points. If the difference between the two gradient points is too large, the direction of the gradient change y k and its value will be affected. We consider making appropriate corrections to the gradient changes of the two iteration points, and then combine the current iteration point gradient with the previous search direction to form a new three-dimensional subspace, and whether this subspace is valuable for research. This paper is taking its as the breakthrough point for research.
In this paper, to avoid the situation in which the changes of the gradient y k = g k + 1 g k dominate the subspace S p a n { g k + 1 , d k , y k } , inspired by [20], we construct a similar subspace Ω k + 1 = S p a n { g k + 1 , s k , y k * } in which y k * = g k + 1 g k + 1 g k g k , and then, by solving the optimal solution of an approximate model of the objective function in the given subspace to gain the corresponding parameters and algorithm. It can be shown that the obtained method is a global convergent and has nice numerical performance.
The rest of this paper is organized as follows: in Section 2, the search direction constructed on a new special three-dimensional subspace Ω k + 1 is presented, and the estimations of matrix-vector production are given. In Section 3, The proposed algorithm and its properties under two necessary assumptions are described in detail. In Section 4, we establish the global convergence of our proposed algorithm under mild conditions. In Section 5, we compare the proposed method numerically with algorithms SMCG_NLS [17] and SMCG_Conic [12]. Finally, in Section 6, we conclude this paper and highlight future work.

2. Search Direction and Step Size

The main content of this section is to introduce four search direction models and the selections of initial step sizes on the newly spanned three-dimensional subspace.

2.1. Direction Choice Model

Inspired by [20], in this paper, the gradient change y k = g k + 1 g k is replaced by y k * = g k + 1 g k + 1 g k g k . Then, the search directions are constructed in the three-dimensional subspace Ω k + 1 = S p a n { g k + 1 , s k , y k * } .
From [10], we know that the approximate model used plays two roles: one is to approximate the original objective function in the subspace Ω k + 1 ; the other one is to make the search direction d k + 1 obtained by the approximate model descend, so that the original objective function declines along this direction. On our proposed subspace Ω k + 1 = S p a n { g k + 1 , s k , y k * } , we consider the approximate model of the objective function as
min d Ω k + 1 ϕ k + 1 ( d ) = g k + 1 T d + 1 2 d T B k + 1 d ,
where B k + 1 is a symmetric positive definite approximation matrix of the Hessian matrix, satisfying B k + 1 s k = y k .
Obviously, there are three dimensions in the subspace Ω k + 1 = S p a n { g k + 1 , s k , y k * } that we are considering.
Situation I: 
dim( Ω k + 1 ) = 3.
When the dimension is 3, it is easy to know that g k + 1 , s k , y k * are not collinear. The form of the search direction is of the following form
d k + 1 = μ k g k + 1 + ν k s k + γ k y k * ,
where μ k , ν k and γ k are undetermined parameters. Substituting (10) into (9) and simplifying, we have
min ( μ , ν , γ ) g k + 1 2 g k + 1 T s k g k + 1 T y k * T μ ν γ + 1 2 μ ν γ T ρ k + 1 g k + 1 T y k w k g k + 1 T y k s k T y k y k T y k * w k y k T y k * ϱ k μ ν γ ,
where ρ k + 1 = g k + 1 T B k + 1 g k + 1 ,   ϱ k = g k + 1 T B k + 1 y k * ,   w k = ( y k * ) T B k + 1 y k * . Set
D k + 1 = ρ k + 1 g k + 1 T y k w k g k + 1 T y k s k T y k y k T y k * w k y k T y k * ϱ k .
Thus, (11) can be summarized as
min ( μ , ν , γ ) g k + 1 2 g k + 1 T s k g k + 1 T y k * T μ ν γ + 1 2 μ ν γ T D k + 1 μ ν γ .
Under some mild conditions, we can prove that D k + 1 is positive definite, which will be discussed in Lemma 1. When D k + 1 is positive definite, by calculation and simplification, the only solution of (11) is
μ k ν k γ k = 1 k + 1 χ θ 1 θ 2 θ 1 θ θ 3 θ 2 θ 3 ω g k + 1 2 g k + 1 T s k g k + 1 T y k * ,
where
k + 1 = | D k + 1 | = ρ k + 1 χ + w k θ 2 + g k + 1 T y k θ 1 , χ = ϱ k ( s k T y k ) ( y k T y k * ) 2 , ω = ρ k + 1 ( s k T y k ) ( g k + 1 T y k ) 2 , θ 1 = w k ( y k T y k * ) ϱ k ( g k + 1 T y k ) , θ = ϱ k ρ k + 1 w k 2 , θ 2 = ( y k T y k * ) ( g k + 1 T y k ) w k ( s k T y k ) , θ 3 = w k ( g k + 1 T y k ) ρ k + 1 ( y k T y k * ) .
In order to avoid the matrix-vector multiplication, we need to estimate ρ k + 1 , ϱ k and w k . Before estimating ρ k + 1 , we first estimate ϱ k , w k .
For ϱ k , we get
ϱ k = ( y k * ) T B k + 1 y k * = ( y k * ) T B k + 1 y k * s k T B k + 1 s k ( ( y k * ) T B k + 1 s k ) 2 ( ( y k * ) T B k + 1 s k ) 2 s k T B k + 1 s k = B k + 1 1 2 y k 2 B k + 1 1 2 s k 2 B k + 1 1 2 y k * T B k + 1 1 2 s k 2 ( y k T y k * ) 2 s k T y k = 1 cos 2 B k + 1 1 2 y k * , B k + 1 1 2 s k ( y k T y k * ) 2 s k T y k .
Based on the analysis of [9], cos 2 B k + 1 1 2 y k * , B k + 1 1 2 s k is desirable, which shows that ϱ k can have the following estimation:
ϱ k = 2 ( y k T y k * ) 2 s k T y k .
which also means that
ϱ k s k T y k ( y k T y k * ) 2 = ( y k T y k * ) 2 .
In order for the experiment to have a better numerical effect, we amended ϱ k as
ϱ k = ( y k T y k * ) 2 s k T y k + λ k ,
where
λ k = m a x ( y k T y k * ) 2 s k T y k , 0.1 y k * 2 .
Obviously, ϱ 0 > 0 . Consider the following matrix,
s k T y k y k T y k * y k T y k * ϱ k .
From (13) and ϱ 0 > 0 , it can be known that the sub-matrix (15) is positive definite, inspired by the BBCG method [11], for w k , we take
w k = ζ k g k + 1 T y k * y k 2 s k T y k ,
where
ζ k = max { 0.9 ζ k 1 , 1.2 } , if α k > 1 , min { 1.1 ζ k 1 , 1.75 } , otherwise ,
where ζ 0 = 1.5 , ζ k [ 1.2 , 1.75 ) .
Now, we estimate ρ k + 1 . Since D k + 1 is positive definite, it is easy to know Δ k + 1 > 0 , therefore
ρ k + 1 w k θ 2 g k + 1 T y k θ 1 χ .
Note that the right-hand side of (18) is n k . Combining with (12), (14) and (16), we have
n k = 1 m k w k 2 ϱ k + ( g k + 1 T y k ) 2 s k T y k 2 w k ( g k + 1 T y k ) ( y k T y k * ) ϱ k ( s k T y k ) , with m k = 1 ( y k T y k * ) 2 ϱ k ( s k T y k ) .
According to (14), we know that m k 1 / 2 .
In order to ensure that (18) holds, we estimate the parameter ρ k + 1 by taking
ρ k + 1 = ζ k max { n k , K } ,
where K = K 1 g k + 1 2 , K 1 = max y k 2 s k T y k , 4 y k 4 y k * 2 ϱ k ( s k T y k ) 2 , and ρ 0 ( 0 , 1 ) . Through the debugging of the algorithm, we find that the numerical experiment effect of ζ k using the adaptive value of (17) is better than that of ζ k with a fixed value, so ζ k also uses (17).
In summary, we find that when the following conditions are satisfied, the search direction d k + 1 is calculated by (10) and (12).
ξ 3 s k 2 g k + 1 2 ,
ξ 1 s k T y k s k 2 y k 2 s k T y k ξ 2 ,
ξ 1 ϱ k y k * 2 , 4 y k 4 y k * 2 ϱ k ( s k T y k ) 2 ξ 2 ,
where ξ 1 , ξ 2 , ξ 3 are positive constants.
Now, let us prove that D k + 1 is positive definite.
Lemma 1.
Assuming that the conditions (21)–(23) hold, and ρ k + 1 is calculated by (20), the matrix D k + 1 in (10) is positive definite.
Proof of Lemma 1. 
Using mathematical induction, it is easy to know ρ 0 ( 0 , 1 ) , ρ k + 1 ξ k K > 0 from (18); for ρ k + 1 > g k + 1 2 y k 2 s k T y k , therefore
k + 1 = | D k + 1 | = ρ k + 1 χ + w k θ 2 + g k + 1 T y k θ 1 > 0 .
So the proof is over.    □
Situation II: 
dim( Ω k + 1 ) = 2.
In this case, the form of the search direction d k + 1 is as follows:
d k + 1 = μ k g k + 1 + ν k s k ,
where μ k , ν k are undetermined parameters. Substituting (24) into (9), we get
min ( μ , ν ) g k + 1 2 g k + 1 T s k T μ ν + 1 2 μ ν T ρ k + 1 g k + 1 T y k g k + 1 T y k s k T y k μ ν .
where ρ k + 1 = g k + 1 T B k + 1 g k + 1 , if it satisfies
k + 1 = ρ k + 1 ( s k T y k ) ( g k + 1 T y k ) 2 > 0 .
Then the unique solution of the problem (25) is
μ k ν k = 1 k + 1 ( g k + 1 T y k ) ( g k + 1 T s k ) ( s k T y k ) g k + 1 2 ( g k + 1 T y k ) g k + 1 2 ρ k + 1 ( g k + 1 T s k ) .
Combined with BBCG, there is
ρ k + 1 = ζ k g k + 1 2 y k 2 s k T y k ,
where ζ k in (27) is the same as (17).
Obviously, under certain conditions, the HS direction can be regarded as a special case of Formula (24). Taking into account the finite termination of the HS method, in order to make our algorithm have good properties, when the following conditions hold,
ξ 1 s k T y k s k 2 ,
| g k + 1 T y k g k + 1 T d k | d k T y k g k + 1 2 ξ 4 ,
where ξ 4 [ 0 , 1 ) . We consider
d k + 1 = g k + 1 + β k d k .
In summary, for the case where the dimension of the subspace is 2; if only condition (22) is true, d k + 1 is calculated by (24). When inequalities (28) and (29) hold, d k + 1 is calculated from (30).
According to the above analysis, d k + 1 is calculated by (24) when only condition (22) is true for the case of the 2-subspace dimension. When conditions (28) and (29) hold, d k + 1 is calculated by (30).
Situation III: 
dim( Ω k + 1 ) = 1.
When all conditions (21), (22), and (23) are not valid, we choose the negative gradient as the search direction, i.e.,
d k + 1 = g k + 1 .

2.2. Selection of Initial Step Size

Considering that the selections of initial step sizes will also have an impact on the algorithm, we choose the initial step size selection method of SMCG_NLS [17], which is also a subspace algorithm.
According to [21], we know that
t k = 2 f k f k + 1 + g k + 1 T s k s k T y k 1 ,
which indicates how close the objective function is to the quadratic function on the line segment formed between the current iteration point and the previous iteration point. Based on [22], we know that the following condition indicates that the objective function is close to a quadratic function:
t k λ 1 o r max t k , t k 1 λ 2 ,
where 0 < λ 1 < λ 2 .
Case I: when the search direction is calculated by Equations (10), or (24) or (30), the initial step size is α k 0 = 1 .
Case II: when d k + 1 = g k + 1 , the initial step size is
α ˜ k = max { min { α k ( B B 2 ) , ϑ m a x } , ϑ m i n } , if g k + 1 T s k 0 , max { min { α k ( B B 1 ) , ϑ m a x } , ϑ m i n } , if g k + 1 T s k < 0 ,
where α k ( B B 1 ) = s k 2 s k T y k , α k ( B B 2 ) = s k T y k y k 2 .

3. The Obtained Algorithm and Descent Property

This section describes the obtained algorithm and its descending properties under two necessary assumptions in detail.

3.1. The Obtained Algorithm

The main content of this section is to introduce our proposed algorithm and give two necessary assumptions. Before introducing the algorithm, we first introduce the restart method we use to restart the proposed algorithm, and then describe the proposed algorithm in detail.
According to [23], set
r k = 2 ( f k + 1 f k ) g k + 1 T s k + g k T s k ,
If r k is close to 1, then the one-dimensional line search function is close to the quadratic function. Similar to [21], if in multiple consecutive iterations, | r k 1 | ξ 5 , we restart the search direction along g k + 1 . In addition, we restart our algorithm if the number of consecutive uses of CG directions reaches the MaxRestart threshold.
Now, the details of the three-term subspace conjugate gradient method (TSCG for short) is given as follows:
Algorithm 1: TSCG Alogrithm
  • Given x 0 R n , α 0 ( 0 ) , ε > 0 , 0 < δ < σ < 1 , ξ 1 , ξ 2 , ξ 3 , ξ 4 , ξ 5 , ξ 6 , λ 1 , λ 2 [ 0 ,   1 ) , MaxRestart, MinQuad. Let C 0 = f 0 , Q 0 = 1 , IterRestart := 0, Numgrad := 0, IterQuad := 0, Numcongrad := 0, and k : = 0 .
  • When g k ε , stop; otherwise, let d 0 : = g 0 , Numgrad = 1.
  • If k = 0 , go to step 4. If d k + 1 = g k + 1 , via (34) Compute α k ( 0 ) ; otherwise, α k ( 0 ) = 1 .
  • Calculate the step size α k by (7) and (5).
  • Update x k + 1 with (2). When g k ε , stop; otherwise, let I t e r R e s t a r t = I t e r R e s t a r t + 1 . If | r k 1 | ξ 5 or | f k + 1 f k ( g k + 1 T s k + g k T s k ) ξ 6 / 6 , I t e r Q u a d = I t e r Q u a d + 1 ; otherwise, I t e r Q u a d = 0 .
  • Calculate the search direction d k + 1 .
    (a)
    If the conditions (21), (22) and (23) hold, calculate d k + 1 by (10) and (12), and then perform step 7;
    (b)
    If the condition (22) holds, calculate d k + 1 by (24) and (26), and then perform step 7;
    (c)
    If the condition (28) and (29) hold, calculate d k + 1 by (30), and then perform step 7;
    (d)
    Otherwise, calculate d k + 1 from (31), and then perform step 7.
  • Update Q k + 1 and C k + 1 by (6).
  • Calculate η k + 1 by (8), and set k : = k + 1 , and go to step 1.

3.2. Descent Properties of Search Direction

In this subsection, we discuss the descent properties of the given algorithm, in which it will be proved that the proposed algorithm (TSCG) fulfills sufficient descent conditions in all cases. Now, we first introduce some common assumptions on the objective function.
Assumption 1.
The objective function f : R n R is continuous differentiable and has a lower bound on R n .
Assumption 2.
The gradient function g is Lipschitz continuous on the bounded level set Θ = { x R n : f ( x ) f ( x 0 ) } ; that is, there exists constant L > 0 , such that
g ( x ) g ( y ) L x y , x , y Θ .
That is: y k L s k .
Lemma 2.
If the search direction d k + 1 is generated by (10) or (24), then
g k + 1 T d k + 1 g k + 1 4 ρ k + 1 ,
holds.
Proof of Lemma 2. 
We only need to discuss the situation of search direction in relation to ρ k + 1 .
Case I: if d k + 1 is generated by (24), the proof is similar to [22], so the proof process is omitted.
Case II: when d k + 1 is calculated by (10) and (12), we have
g k + 1 T d k + 1 = g k + 1 2 g k + 1 T s k g k + 1 T y k * T μ k ν k γ k = 1 k + 1 g k + 1 2 g k + 1 T s k g k + 1 T y k * T χ θ 1 θ 2 θ 1 θ θ 3 θ 2 θ 3 ω g k + 1 2 g k + 1 T s k g k + 1 T y k * = g k + 1 4 k + 1 φ ( x , y ) ,
where φ x , y is the binary quadratic function of variables g k + 1 T y k * g k + 1 2 and g k + 1 T s k g k + 1 2 represented as x and y, and its simplified form is expressed as
φ ( x , y ) = ω x 2 + 2 θ 3 x y + θ y 2 + 2 θ 2 x + 2 θ 1 y + χ .
From Lemma 1, it is easy to get ω > 0 , ω θ θ 3 2 = k + 1 ρ k + 1 > 0 ; that is
φ ( x , y ) m i n = k + 1 ρ k + 1 .
Therefore, we have
g k + 1 T d k + 1 g k + 1 4 k + 1 φ ( x , y ) m i n g k + 1 4 ρ k + 1 .
Thus, that is the end of the proof. □
Lemma 3.
Assume that d k + 1 is generated by the algorithm TSCG. there exists a constant c 1 > 0 , such that
g k + 1 T d k + 1 c 1 g k + 1 2 .
Proof of Lemma 3. 
As for the four forms of the constituent directions, we discuss them separately.
Case I: if d k + 1 = g k + 1 , let c 1 = 1 2 , then (36) holds.
Case II: when the search direction is binomial, namely, we first discuss the situation given by (30). When d k + 1 is determined by (30), combined with (28) and (29), for β k = β k H S , we have
g k + 1 T d k + 1 = g k + 1 2 + β k g k + 1 T d k g k + 1 2 + | ( g k + 1 T y k ) ( g k + 1 T d k ) | d k T y k g k + 1 2 + ξ 4 g k + 1 2 ( 1 ξ 4 ) g k + 1 2 .
Case III: now, we discuss another case where the search direction is a binomial; that is, the situation where the search direction is generated by (24). Combining (22) and (27), obviously
ρ k + 1 ζ k g k + 1 2 y k 2 s k T y k 2 ξ 2 g k + 1 2 .
In combination with the above formula and (35), it can be obtained
g k + 1 T d k + 1 1 2 ξ 2 g k + 1 2 .
Case IV: when the search direction is trinomial; that is, d k + 1 is given by (10) and (12). Consider Lemma 2 and use (20), (21), and χ > 0 , we first prove that ρ k + 1 has an upper bound:
| n k | = ( w k 2 ϱ k + ( g k + 1 T y k ) 2 s k T y k 2 w k ( g k + 1 T y k ) ( y k T y k * ) ϱ k ( s k T y k ) ) m k 4 y k 4 y k * 2 ϱ k ( s k T y k ) 2 + y k 2 s k T y k g k + 1 2 + 2 w k ϱ k g k + 1 T y k s k T y k y k T y k * ϱ k s k T y k m k 2 K 1 g k + 1 2 + 2 | w k | ϱ k | g k + 1 T y k | s k T y k m k ( 2 K + 2 K K ) m k 8 K .
The ρ k in the root of the above second inequality is ϱ k .
Combining the conditions (22) and (23), it is easy to know that K 1 in (20) has an upper bound, namely
K 1 = max y k 2 s k T y k , 4 y k 4 y k * 2 ϱ k ( s k T y k ) 2 ξ 2 .
There is
ρ k + 1 = ζ k max { n k , K } 16 K = 16 K 1 g k + 1 2 16 ξ 2 g k + 1 2 .
Finally, according to Lemma 2, we have
g k + 1 T d k + 1 g k + 1 4 ρ k + 1 1 16 ξ 2 g k + 1 2 .
In summary, the value of c 1 , satisfying (36), is
c 1 = min 1 2 , 1 2 ξ 2 , ( 1 ξ 4 ) , 1 16 ξ 2 .
Then the proof is complete. □
Lemma 4.
If the search direction d k + 1 is calculated by TSCG, then there exists a constant c 2 > 0 , such that
d k + 1 c 2 g k + 1 .
Proof of Lemma 4. 
Similar to Lemma 3, we discuss four cases of search directions, respectively.
Case I: if d k + 1 = g k + 1 , let c 2 = 1 , then (37) is true.
Case II: if d k + 1 is calculated by (30), according to Assumption 2 and the condition (28), we have
d k + 1 = g k + 1 + β k H S d k g k + 1 + g k + 1 y k d k d k T y k 1 + L ξ 1 g k + 1 .
Case III: if d k + 1 is calculated by (24) and (26), then, combining (22) and (27), and the Cauchy inequality, we deduce
k + 1 = ρ k + 1 ( s k T y k ) ( g k + 1 T y k ) 2 ξ 1 s k 2 ρ k + 1 ( g k + 1 T y k ) 2 s k T y k 1 5 ξ 1 s k 2 g k + 1 2 y k 2 s k T y k .
Combining the above equation, (27), the Cauchy inequality, and the triangle inequality, we can get
d k + 1 = μ k g k + 1 + ν k s k 1 k + 1 ( | g k + 1 T y k | | g k + 1 T s k | g k + 1 + | g k + 1 T y k | g k + 1 2 + | ρ k + 1 ( g k + 1 T s k ) | s k ) 1 k + 1 2 s k y k + ρ k + 1 s k 2 g k + 1 2 g k + 1 3 5 s k T y k ξ 1 s k 2 y k 2 2 s k y k + ρ k + 1 s k 2 g k + 1 2 g k + 1 10 s k T y k ξ 1 s k y k + 10 ξ 1 g k + 1 20 ξ 1 g k + 1 .
Case IV: when the search direction is three; that is, the search direction is calculated by (10) and (12). Similar to Case III, let us first derive the lower bound of k + 1 . According to (16), (18) and (20), we have
k + 1 = ρ k + 1 χ + w k θ 2 + g k + 1 T y k θ 1 = χ ρ k + 1 w k θ 2 g k + 1 T y k θ 1 χ = χ ( ρ k + 1 n k ) χ ( 1.2 max { n k , K } n k ) 1 5 χ K .
Let us write χ 1 = ϱ k ( s k T y k ) , and combine that with m k 1 2 , we have χ = χ 1 m k 1 2 χ 1 > 0 , therefore
k + 1 1 10 χ 1 K .
Then
d k + 1 = μ k g k + 1 + ν k s k + γ k y k * = 1 k + 1 g k + 1 2 | g k + 1 T s k | | g k + 1 T y k * | T | χ | | θ 1 | | θ 2 | | θ 1 | | θ | | θ 3 | | θ 2 | | θ 3 | | ω | g k + 1 s k y k * g k + 1 k + 1 ( χ 1 g k + 1 2 + 4 a k χ 1 K 1 g k + 1 2 + 2 b k χ 1 ( K + ρ k + 1 ) + c k ρ k + 1 ) 10 g k + 1 χ 1 K ( χ 1 g k + 1 2 + 4 a k χ 1 K 1 g k + 1 2 + 2 b k χ 1 ( K + ρ k + 1 ) + c k ρ k + 1 ) = g k + 1 10 K 1 + 40 K 1 a k χ 1 + 340 × b k χ 1 + 160 × c k χ 1 ,
where a k = ϱ k s k + s k T y k y k * , b k = s k y k * , c k = ϱ k s k 2 + s k T y k y k * 2 . From (20), (22) and (23), through calculation and simplification, we obtain
ξ 1 y k 2 s k T y k K 1 ,
a k χ 1 2 ξ 1 , b k χ 1 1 ξ 1 , c k χ 1 2 ξ 1 .
According to the above results, it can be further deduced
d k + 1 750 ξ 1 g k + 1 .
According to the four cases of the above analyses, the value of c 2 that satisfies (38) is
c 2 = m a x 1 , 1 + L ξ 1 , 20 ξ 1 , 750 ξ 1 ,
The proof is ended. □

4. Convergence Analysis

The global convergence of the algorithm for general functions is proven in this section.
Lemma 5.
Suppose α k is generated by line search (7) and (5), and satisfies Assumption 2, then
α k ( 1 σ ) | g k T d k | L d k 2 .
Proof of Lemma 5. 
By line search conditions (7) and (5), we have
( σ 1 ) g k T d k ( g k + 1 g k ) T d k = y k T d k y k d k α k L d k 2 .
We notice that σ < 1 , g k T d k < 0 , and (39) holds immediately. □
Theorem 1.
Suppose Assumption 1 and Assumption 2 are satisfied, and sequence x k is generated by algorithm TSCG, then
lim inf k g k = 0 .
Proof of Theorem 1. 
According to (5) and (39), there is
δ α k g k T d k = ( 1 σ ) δ L g k T d k d k 2 ( 1 σ ) δ c 3 2 L c 4 2 g k 2 .
Combining (41) and (7), we get
( 1 σ ) δ c 3 2 L c 4 2 g k 2 f k + η k f k + 1 .
Clearly
f k + η k f k + 1 0 .
The above equation with lim k η k = 0 is equivalent to
lim inf k ( f k f k + 1 ) lim inf k ( f k + η k f k + 1 ) + lim inf k ( η k ) = l 0 .
Suppose l > 0 , for ε = min 1 , l 2 > 0 , for any k > N , there exists N > 0 , then
f k + 1 < f k ε k .
Combining with (44) and k = N + 1 + 1 k = + , then lim k f k = . This contradicts Assumption 1 that f k on R n has a lower bound. Therefore,
lim inf k ( f k + η k f k + 1 ) = l = 0 .
In combination with the above equation, (42) and lim k η k = 0 , then (40) is true. The proof is complete. □

5. Numerical Results

In this section, we compare the numerical performance of the TSCG algorithm with SMCG_NLS [17] and SMCG_Conic [12] algorithms, both of which are subspace minimization algorithms, through numerical experiments to prove the effectiveness of the proposed TSCG algorithm. Performance profiles of Dolan and Moré [24] were used to test the performance of the method. Our test functions were derived from 67 functions in [25], as shown in Table 1. It was programmed and run on a Windows 10 PC with a 1.80-GHz CPU and 16.00 GB memory, 64-bit operating system. We set the termination criteria as: g k ε , or when the number of iterations of the program exceeded 200,000, and exited when one of them was true. The dimensions of the variables of the test function were 10,000 and 12,000 respectively.
The following shows the selection of parameters and some tags and numerical experiments. The initial step size of the first iteration in this paper uses the adaptive strategy of [26].
α 0 ( 0 ) = 1.0 , if x 0 < 10 30 a n d f 0 < 10 30 , 2 | f 0 | g 0 , if x 0 < 10 30 a n d f 0 10 30 , min 1.0 , x 0 g 0 , if x 0 10 30 a n d g 0 < 10 7 , min 1.0 , max x 0 g 0 , 1 g 0 , if x 0 10 30 a n d g 0 10 7 ,
where · represents the infinite norm.
The parameters of SMCG_NLS and SMCG_Conic algorithms are selected according to the values selected in the original paper. The parameters of the TSCG algorithm proposed by us are selected as follows.
ε = 10 6 , δ = 0.001 , σ = 0.9999 , ξ 1 = 10 7 , ξ 2 = 10 6 , ξ 3 = 10 4 , ξ 4 = 0.875 ,
ξ 5 = 10 8 , ξ 6 = 10 12 , λ 1 = 10 5 , λ 2 = 0.06 , ϑ m i n = 10 30 , ϑ m a x = 10 30 .
The relevant tags of this article are as follows:
  • Ni: the number of iterations.
  • Nf: the number of function evaluations.
  • Ng: the number of gradient evaluations.
  • CPU: the running time of the algorithm (in seconds).
Now, we compare the algorithm TSCG in this paper with SMCG_NLS [17] and SMCG_Conic [12] in numerical diagrams. The performance comparison diagrams of Ni, Ng, Nf and CPU correspond to Figure 1, Figure 2, Figure 3 and Figure 4 respectively. According to Figure 1 and Figure 2, it is found that the robustness and stability of TSCG are significantly better than that of SMCG_NLS and SMCG_Conic in terms of the number of iterations and the total number of gradient calculations. Overall, SMCG_NLS has better robustness and stability than SMCG_Conic. From Figure 3, we know that both TSCG and SMCG_NLS are superior to SMCG_Conic, and TSCG begins to be better than SMCG_NLS when τ 14.3231 , even when factor τ 14.3231 , the total calculated by the function TSCG does not perform as well as SMCG_NLS. Figure 4 shows that TSCG and SMCG_NLS are better than SMCG_Conic in terms of the robustness and stability of running time. In general, TSCG is almost better than SMCG_NLS in terms of robustness and stability, only when factor 2.6328 τ 7.6472 , TSCG is inferior to SMCG_NLS.

6. Conclusions and Prospect

In order to obtain a more efficient and robust conjugate gradient algorithm for solving the unconstrained optimization problem, by constructing a new three-dimensional subspace Ω k + 1 = S p a n { g k + 1 , s k , y k * } , and solving the sub-problem of a quadratic approximate model of the objective function in the given subspace, we obtained a new three-term conjugate gradient method (TSCG). The global convergence of the TSCG method for general functions is established under mild conditions. Numerical results show that TSCG has better performance than SMCG_NLS and SMCG_Conic, both of which are subspace algorithms, for a given test set.
As for the research of the subspace algorithm, the main content of our future work is to continue to study the estimation of terms, including Hessian matrix, to discuss whether it can be extended to the constrained optimization algorithm, and consider the application of the algorithm to image restoration, image segmentation, and path planning of engineering problems.

Author Contributions

Conceptualization, J.H. and S.Y.; methodology, J.Y.; software and validation, J.Y., G.W., visualization and formal analysis, J.Y.; writing—original draft preparation, J.Y.; supervision, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Science Foundation of China no. 71862003, Natural Science Foundation of Guangxi Province (CN) no. 2020GXNSFAA159014, the Program for the Innovative Team of Guangxi University of Finance and Economics, and the Special Funds for Local Science and Technology Development guided by the central government, grant number ZY20198003.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hestenes, M.R.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
  2. Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
  3. Polyak, B.T. The conjugate gradient method in extremal problems. JUssr Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
  4. Liu, Y.; Storey, C. Efficient generalized conjugate gradient algorithms, part 1: Theory. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
  5. Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. Siam J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
  6. Fletcher, R. Volume 1 Unconstrained Optimization. In Practical Methods of Optimization; Wiley-Interscience: New York, NY, USA, 1980; p. 120. [Google Scholar]
  7. Zhang, H.; Hager, W.W. A non-monotone line search technique and its application to unconstrained optimization. Soc. Ind. Appl. Math. J. Optim. 2004, 14, 1043–1056. [Google Scholar]
  8. Liu, H.; Liu, Z. An efficient Barzilai–Borwein conjugate gradient method for unconstrained optimization. J. Optim. Theory Appl. 2019, 180, 879–906. [Google Scholar] [CrossRef]
  9. Yuan, Y.X.; Stoer, J. A subspace study on conjugate gradient algorithms. ZAMM-J. Appl. Math. Mech. FüR Angew. Math. Und Mech. 1995, 75, 69–77. [Google Scholar] [CrossRef]
  10. Dai, Y.H.; Kou, C.X. A Barzilai–Borwein conjugate gradient method. Sci. China Math. 2016, 59, 1511–1524. [Google Scholar] [CrossRef]
  11. Barzilai, J.; Borwein, J.M. Two-Point Step Size Gradient Methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
  12. Li, Y.; Liu, Z.; Hongwei, L. A subspace minimization conjugate gradient method based on conic model for unconstrained optimization. Comput. Appl. Math. 2019, 38, 16. [Google Scholar] [CrossRef]
  13. Wang, T.; Liu, Z.; Liu, H. A new subspace minimization conjugate gradient method based on tensor model for unconstrained optimization. Int. J. Comput. Math. 2019, 96, 1924–1942. [Google Scholar] [CrossRef]
  14. Zhao, T.; Liu, H.; Liu, Z. New subspace minimization conjugate gradient methods based on regularization model for unconstrained optimization. Numer. Algorithms 2020, 87, 1501–1534. [Google Scholar] [CrossRef]
  15. Andrei, N. An accelerated subspace minimization three-term conjugate gradient algorithm for unconstrained optimization. Numer. Algorithms 2014, 65, 859–874. [Google Scholar] [CrossRef]
  16. Yang, Y.; Chen, Y.; Lu, Y. A subspace conjugate gradient algorithm for large-scale unconstrained optimization. Numer. Algorithms 2017, 76, 813–828. [Google Scholar] [CrossRef]
  17. Li, M.; Liu, H.; Liu, Z. A new subspace minimization conjugate gradient method with non-monotone line search for unconstrained optimization. Numer. Algorithms 2018, 79, 195–219. [Google Scholar] [CrossRef]
  18. Yao, S.; Wu, Y.; Yang, J.; Xu, J. A Three-Term Gradient Descent Method with Subspace Techniques. Math. Probl. Eng. 2021, 2021, 8867309. [Google Scholar] [CrossRef]
  19. Yuan, Y. A review on subspace methods for nonlinear optimization. Proc. Int. Congr. Math. 2014, 807–827. [Google Scholar]
  20. Wei, Z.; Yao, S.; Liu, L. The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006, 183, 1341–1350. [Google Scholar] [CrossRef]
  21. Yuan, Y. A modified BFGS algorithm for unconstrained optimization. IMA J. Numer. Anal. 1991, 11, 325–332. [Google Scholar] [CrossRef]
  22. Dai, Y.; Yuan, J.; Yuan, Y.X. Modified two-point stepsize gradient methods for unconstrained optimization. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
  23. Dai, Y.H.; Kou, C.X. A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search. Soc. Ind. Appl. Math. J. Optim. 2013, 23, 296–320. [Google Scholar] [CrossRef] [Green Version]
  24. Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
  25. Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
  26. Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Performance profiles of the number of iterations (Ni).
Figure 1. Performance profiles of the number of iterations (Ni).
Symmetry 14 00080 g001
Figure 2. Performance profiles of the gradient (Ng).
Figure 2. Performance profiles of the gradient (Ng).
Symmetry 14 00080 g002
Figure 3. Performance profiles of the function (Nf).
Figure 3. Performance profiles of the function (Nf).
Symmetry 14 00080 g003
Figure 4. Performance profiles of CPU time (CPU).
Figure 4. Performance profiles of CPU time (CPU).
Symmetry 14 00080 g004
Table 1. The test problems.
Table 1. The test problems.
No.Test ProblemsNo.Test Problems
1Extended Freudenstein & Roth function35NONDIA function (CUTE)
2Extended Trigonometric function36DQDRTIC function (CUTE)
3Extended Rosenbrock function37EG2 function (CUTE)
4Generalized Rosenbrock function38DIXMAANA - DIXMAANL functions
5Extended White & Holst function39Partial Perturbed Quadratic function
6Extended Beale function40Broyden Tridiagonal function
7Extended Penalty function41Almost Perturbed Quadratic function
8Perturbed Quadratic function42Perturbed Tridiagonal Quadratic function
9Diagonal 1 function43Staircase 1 function
10Diagonal 3 function44Staircase 2 function
11Extended Tridiagonal 1 function45LIARWHD function (CUTE)
12Full Hessian FH3 function46POWER function (CUTE)
13Generalized Tridiagonal 2 function47ENGVAL1 function (CUTE)
14Diagonal 5 function48CRAGGLVY function (CUTE)
15Extended Himmelblau function49EDENSCH function (CUTE)
16Generalized White & Holst function50CUBE function (CUTE)
17Extended PSC1 function51BDEXP function (CUTE)
18Extended Powell function52NONSCOMP function (CUTE)
19Full Hessian FH2 function53VARDIM function (CUTE)
20Extended Maratos function54SINQUAD function (CUTE)
21Extended Cliff function55Extended DENSCHNB function (CUTE)
22Perturbed quadratic diagonal function 256Extended DENSCHNF function (CUTE)
23Extended Wood function57LIARWHD function (CUTE)
24Extended Hiebert function58COSINE function (CUTE)
25Quadratic QF1 function59SINE function:83
26Extended quadratic penalty QP1 function60Generalized Quartic function
27Extended quadratic penalty QP2 function61Diagonal 7 function
28Quadratic QF2 function62Diagonal 8 function
29Extended quadratic exponential EP1 function63Extended TET function:(Three exponential terms)
30Extended Tridiagonal 2 function64SINCOS function
31FLETCHCR function (CUTE)65Diagonal 9 function
32TRIDIA function (CUTE)66HIMMELBG function (CUTE)
33ARGLINB function (CUTE)67HIMMELH function (CUTE)
34ARWHEAD function (CUTE)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huo, J.; Yang, J.; Wang, G.; Yao, S. A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization. Symmetry 2022, 14, 80. https://doi.org/10.3390/sym14010080

AMA Style

Huo J, Yang J, Wang G, Yao S. A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization. Symmetry. 2022; 14(1):80. https://doi.org/10.3390/sym14010080

Chicago/Turabian Style

Huo, Jun, Jielan Yang, Guoxin Wang, and Shengwei Yao. 2022. "A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization" Symmetry 14, no. 1: 80. https://doi.org/10.3390/sym14010080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop