A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization

Jun Huo; Jielan Yang; Guoxin Wang; Shengwei Yao

doi:10.3390/sym14010080

,

and

¹

Guangxi (ASEAN) Financial Research Center, Guangxi University of Finance and Economics, Nanning 530007, China

²

School of Mathematics and Information Science, Guangxi University, Nanning 530004, China

³

Guangxi Key Laboratory Cultivation Base of Cross-Border E-Commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530007, China

^*

Authors to whom correspondence should be addressed.

Symmetry2022, 14(1), 80;https://doi.org/10.3390/sym14010080

This article belongs to the Special Issue Nonlinear PDEs and Symmetry

Version Notes

Order Reprints

Abstract

In this paper, a three-parameter subspace conjugate gradient method is proposed for solving large-scale unconstrained optimization problems. By minimizing the quadratic approximate model of the objective function on a new special three-dimensional subspace, the embedded parameters are determined and the corresponding algorithm is obtained. The global convergence result of a given method for general nonlinear functions is established under mild assumptions. In numerical experiments, the proposed algorithm is compared with SMCG_NLS and SMCG_Conic, which shows that the given algorithm is robust and efficient.

Keywords:

unconstrained optimization; conjugate gradient method; subspace method; quadratic model

1. Introduction

The conjugate gradient method is one of the most important methods used for solving large-scale unconstrained problems, because of its simple structure, lower computation, storage, fast convergence, etc. The general unconstrained optimization problem is as follows:

min_{x \in R^{n}} f (x),

(1)

where

f : R^{n} \to R

is continuously differentiable. The function value of

f (x)

at

x_{k}

is denoted as

f_{k}

, and its gradient is expressed as

g_{k}

. Let

α_{k}

be the step size; we have the following iteration form for conjugate gradient method:

x_{k + 1} = x_{k} + α_{k} d_{k}, k \geq 0,

(2)

where

d_{k}

is the search direction, which has the form of

d_{k + 1} = \{\begin{matrix} - g_{k + 1}, & if k = 0, \\ - g_{k + 1} + β_{k} d_{k}, & if k \geq 1, \end{matrix}

(3)

where

β_{k} \in R

, referred to as the conjugate parameter. For different selections of

β_{k}

, there are several well-known nonlinear conjugate gradient methods [1,2,3,4,5,6]:

\begin{matrix} β_{k}^{H S} = \frac{g_{k + 1}^{T} y_{k}}{d_{k}^{T} y_{k}}, β_{k}^{F R} = \frac{∥ g_{k + 1} ∥^{2}}{∥ g_{k} ∥^{2}}, β_{k}^{P R P} = \frac{g_{k + 1}^{T} y_{k}}{∥ g_{k} ∥^{2}}, \end{matrix}

\begin{matrix} β_{k}^{D Y} = \frac{∥ g_{k + 1} ∥^{2}}{d_{k}^{T} g_{k}}, β_{k}^{L S} = \frac{g_{k + 1}^{T} y_{k}}{- d_{k}^{T} g_{k}}, β_{k}^{C D} = \frac{∥ g_{k + 1} ∥^{2}}{- d_{k}^{T} g_{k}}, \end{matrix}

where

∥ \cdot ∥

represents the Euclidean norm, and

y_{k} = g_{k + 1} - g_{k}

.

The step size

α_{k}

can be obtained in different ways. Zhang and Hager [7] proposed an effective non-monotone Wolfe line search as follows:

f (x_{k} + α_{k} d_{k}) \leq C_{k} + δ α_{k} g_{k}^{T} d_{k},

(4)

g_{k + 1}^{T} d_{k} \geq σ g_{k}^{T} d_{k} .

(5)

The

C_{k}

in the Formula (4) is the convex combination of

f_{0}, f_{1}, \dots, f_{k}

.

0 < δ < σ < 1, C_{0} = f_{0}, Q_{0} = 1

,

C_{k + 1}

and

Q_{k + 1}

are updated by the following rule:

C_{k + 1} = \frac{η_{k} Q_{k} C_{k} + f_{k + 1}}{Q_{k + 1}}, Q_{k + 1} = η_{k} Q_{k} + 1 .

(6)

where

0 \leq η_{m i n} \leq η_{k} \leq η_{m a x} \leq 1

.

The generalized non-monotone line search in [8] is composed of Formula (5) and

\begin{matrix} \begin{matrix} f (x_{k} + α_{k} d_{k}) \leq f (x_{k} + α_{k} d_{k}) + η_{k} + δ α_{k} g_{k}^{T} d_{k}, \end{matrix} \end{matrix}

(7)

where

0 < δ < σ < 1,

and

η_{k} = \{\begin{matrix} 0, & if k = 0, \\ min \{\frac{1}{(k lg (k / n + 12)}, C_{k} - f (x_{k})\}, & if k \geq 1 . \end{matrix}

(8)

For large-scaled optimization problems, some researchers have been looking for more efficient algorithms. In 1995, Yuan and Stoer [9] first proposed the method of embedding subspace technology into the conjugate gradient algorithm framework, namely the two-dimensional subspace minimization conjugate gradient method (SMCG for short). The search direction is calculated by minimizing the quadratic approximation model on the two-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}}

, namely

d_{k + 1} = μ_{k} g_{k + 1} + ν_{k} s_{k},

where

μ_{k}

and

ν_{k}

are parameters, and

s_{k} = x_{k + 1} - x_{k}

. Analogously, the calculation of the search direction is directly extended to

S p a n {g_{k + 1}, s_{k}, s_{k - 1}}

. By this way, we can avoid solving the sub-problem in the total space, which can reduce the computation and storage cost immensely.

Inspired by SMCG, some researchers began to investigate the algorithm of the conjugate gradient method combined with subspace technology. Dai et al. [10] focused on the analysis of the subspace minimization conjugate gradient method proposed by Yuan and Stoer [9] and integrated SMCG with Barzilai–Borwein [11], a new Barzilai–Borwein conjugate gradient method (BBCG for short) was proposed. In the subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}}

, Li et al. [12] discussed the case where the search direction was generated by minimizing the conic model when the non-quadratic state of the objective function was stronger. Wang et al. [13] changed the conic model of [12] into the tensor model. Zhao et al. [14] discussed the case of regularization model. Andrei [15] further expanded the search direction, developed it into a three-dimensional subspace

Ω_{k + 1} = S p a n {- g_{k + 1}, s_{k}, y_{k}}

, and proposed a new SMCG method (TTS). Inspired by Andrei, Yang et al. [16] carried out a similar study. They applied the subspace minimization technique to another special three-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, s_{k - 1}}

, and obtained a new SMCG method (STT). On the same subspace, Li et al. [17] further studied Yang’s results, analyzed the more complex three parameters, and proposed a new subspace minimization conjugate gradient method (SMCG_NLS). Yao et al. [18] proposed a new three-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, g_{k}}

and obtained the TCGS method by using the modified secant equation.

According to [19], it can be seen that the key to embedding subspace technology into the conjugate gradient method is to construct an appropriate subspace, select an approximate model, and estimate the terms of the Hessian matrix. The subspace

Ω_{k + 1} = S p a n {- g_{k + 1}, s_{k}, y_{k}}

contains the gradient of the two iteration points. If the difference between the two gradient points is too large, the direction of the gradient change

y_{k}

and its value will be affected. We consider making appropriate corrections to the gradient changes of the two iteration points, and then combine the current iteration point gradient with the previous search direction to form a new three-dimensional subspace, and whether this subspace is valuable for research. This paper is taking its as the breakthrough point for research.

In this paper, to avoid the situation in which the changes of the gradient

y_{k} = g_{k + 1} - g_{k}

dominate the subspace

S p a n {g_{k + 1}, d_{k}, y_{k}}

, inspired by [20], we construct a similar subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

in which

y_{k}^{*} = g_{k + 1} - \frac{∥ g_{k + 1} ∥}{∥ g_{k} ∥} g_{k}

, and then, by solving the optimal solution of an approximate model of the objective function in the given subspace to gain the corresponding parameters and algorithm. It can be shown that the obtained method is a global convergent and has nice numerical performance.

The rest of this paper is organized as follows: in Section 2, the search direction constructed on a new special three-dimensional subspace

Ω_{k + 1}

is presented, and the estimations of matrix-vector production are given. In Section 3, The proposed algorithm and its properties under two necessary assumptions are described in detail. In Section 4, we establish the global convergence of our proposed algorithm under mild conditions. In Section 5, we compare the proposed method numerically with algorithms SMCG_NLS [17] and SMCG_Conic [12]. Finally, in Section 6, we conclude this paper and highlight future work.

2. Search Direction and Step Size

The main content of this section is to introduce four search direction models and the selections of initial step sizes on the newly spanned three-dimensional subspace.

2.1. Direction Choice Model

Inspired by [20], in this paper, the gradient change

y_{k} = g_{k + 1} - g_{k}

is replaced by

y_{k}^{*} = g_{k + 1} - \frac{∥ g_{k + 1} ∥}{∥ g_{k} ∥} g_{k}

. Then, the search directions are constructed in the three-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

.

From [10], we know that the approximate model used plays two roles: one is to approximate the original objective function in the subspace

Ω_{k + 1}

; the other one is to make the search direction

d_{k + 1}

obtained by the approximate model descend, so that the original objective function declines along this direction. On our proposed subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

, we consider the approximate model of the objective function as

min_{d \in Ω_{k + 1}} ϕ_{k + 1} (d) = g_{k + 1}^{T} d + \frac{1}{2} d^{T} B_{k + 1} d,

(9)

where

B_{k + 1}

is a symmetric positive definite approximation matrix of the Hessian matrix, satisfying

B_{k + 1} s_{k} = y_{k}

.

Obviously, there are three dimensions in the subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

that we are considering.

Situation I:

dim(

Ω_{k + 1}

) = 3.

When the dimension is 3, it is easy to know that

g_{k + 1}, s_{k}, y_{k}^{*}

are not collinear. The form of the search direction is of the following form

d_{k + 1} = μ_{k} g_{k + 1} + ν_{k} s_{k} + γ_{k} y_{k}^{*},

(10)

where

μ_{k}, ν_{k}

and

γ_{k}

are undetermined parameters. Substituting (10) into (9) and simplifying, we have

min_{(μ, ν, γ)} \{{(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix})}^{T} (\begin{matrix} μ \\ ν \\ γ \end{matrix}) + \frac{1}{2} {(\begin{matrix} μ \\ ν \\ γ \end{matrix})}^{T} (\begin{matrix} ρ_{k + 1} & g_{k + 1}^{T} y_{k} & w_{k} \\ g_{k + 1}^{T} y_{k} & s_{k}^{T} y_{k} & y_{k}^{T} y_{k}^{*} \\ w_{k} & y_{k}^{T} y_{k}^{*} & ϱ_{k} \end{matrix}) (\begin{matrix} μ \\ ν \\ γ \end{matrix})\},

(11)

where

ρ_{k + 1} = g_{k + 1}^{T} B_{k + 1} g_{k + 1}

,

ϱ_{k} = g_{k + 1}^{T} B_{k + 1} y_{k}^{*}

,

w_{k} = {(y_{k}^{*})}^{T} B_{k + 1} y_{k}^{*}

. Set

D_{k + 1} = (\begin{matrix} ρ_{k + 1} & g_{k + 1}^{T} y_{k} & w_{k} \\ g_{k + 1}^{T} y_{k} & s_{k}^{T} y_{k} & y_{k}^{T} y_{k}^{*} \\ w_{k} & y_{k}^{T} y_{k}^{*} & ϱ_{k} \end{matrix}) .

Thus, (11) can be summarized as

min_{(μ, ν, γ)} \{{(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix})}^{T} (\begin{matrix} μ \\ ν \\ γ \end{matrix}) + \frac{1}{2} {(\begin{matrix} μ \\ ν \\ γ \end{matrix})}^{T} D_{k + 1} (\begin{matrix} μ \\ ν \\ γ \end{matrix})\} .

Under some mild conditions, we can prove that

D_{k + 1}

is positive definite, which will be discussed in Lemma 1. When

D_{k + 1}

is positive definite, by calculation and simplification, the only solution of (11) is

(\begin{matrix} μ_{k} \\ ν_{k} \\ γ_{k} \end{matrix}) = - \frac{1}{▵_{k + 1}} (\begin{matrix} χ & θ_{1} & θ_{2} \\ θ_{1} & θ & θ_{3} \\ θ_{2} & θ_{3} & ω \end{matrix}) (\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix}),

(12)

where

\begin{matrix} ▵_{k + 1} & = | D_{k + 1} | = ρ_{k + 1} χ + w_{k} θ_{2} + g_{k + 1}^{T} y_{k} θ_{1}, \\ χ & = ϱ_{k} (s_{k}^{T} y_{k}) - {(y_{k}^{T} y_{k}^{*})}^{2}, \\ ω & = ρ_{k + 1} (s_{k}^{T} y_{k}) - {(g_{k + 1}^{T} y_{k})}^{2}, \\ θ_{1} & = w_{k} (y_{k}^{T} y_{k}^{*}) - ϱ_{k} (g_{k + 1}^{T} y_{k}), \\ θ & = ϱ_{k} ρ_{k + 1} - w_{k}^{2}, \\ θ_{2} & = (y_{k}^{T} y_{k}^{*}) (g_{k + 1}^{T} y_{k}) - w_{k} (s_{k}^{T} y_{k}), \\ θ_{3} & = w_{k} (g_{k + 1}^{T} y_{k}) - ρ_{k + 1} (y_{k}^{T} y_{k}^{*}) . \end{matrix}

In order to avoid the matrix-vector multiplication, we need to estimate

ρ_{k + 1}

,

ϱ_{k}

and

w_{k}

. Before estimating

ρ_{k + 1}

, we first estimate

ϱ_{k}, w_{k}

.

For

ϱ_{k}

, we get

\begin{matrix} ϱ_{k} & = {(y_{k}^{*})}^{T} B_{k + 1} y_{k}^{*} \\ = (\frac{{(y_{k}^{*})}^{T} B_{k + 1} y_{k}^{*} s_{k}^{T} B_{k + 1} s_{k}}{{({(y_{k}^{*})}^{T} B_{k + 1} s_{k})}^{2}}) (\frac{{({(y_{k}^{*})}^{T} B_{k + 1} s_{k})}^{2}}{s_{k}^{T} B_{k + 1} s_{k}}) \\ = \frac{{∥B_{k + 1}^{\frac{1}{2}} y_{k}∥}^{2} {∥B_{k + 1}^{\frac{1}{2}} s_{k}∥}^{2}}{{({(B_{k + 1}^{\frac{1}{2}} y_{k}^{*})}^{T} (B_{k + 1}^{\frac{1}{2}} s_{k}))}^{2}} \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k}} \\ = \frac{1}{{cos}^{2} ⟨B_{k + 1}^{\frac{1}{2}} y_{k}^{*}, B_{k + 1}^{\frac{1}{2}} s_{k}⟩} \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k}} . \end{matrix}

Based on the analysis of [9],

{cos}^{2} ⟨B_{k + 1}^{\frac{1}{2}} y_{k}^{*}, B_{k + 1}^{\frac{1}{2}} s_{k}⟩

is desirable, which shows that

ϱ_{k}

can have the following estimation:

ϱ_{k} = 2 \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k}} .

(13)

which also means that

ϱ_{k} s_{k}^{T} y_{k} - {(y_{k}^{T} y_{k}^{*})}^{2} = {(y_{k}^{T} y_{k}^{*})}^{2} .

In order for the experiment to have a better numerical effect, we amended

ϱ_{k}

as

ϱ_{k} = \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k}} + λ_{k},

(14)

where

λ_{k} = m a x \{\frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{s_{k}^{T} y_{k}}, 0.1 {∥ y_{k}^{*} ∥}^{2}\} .

Obviously,

ϱ_{0} > 0

. Consider the following matrix,

(\begin{matrix} s_{k}^{T} y_{k} & y_{k}^{T} y_{k}^{*} \\ y_{k}^{T} y_{k}^{*} & ϱ_{k} \end{matrix}) .

(15)

From (13) and

ϱ_{0} > 0

, it can be known that the sub-matrix (15) is positive definite, inspired by the BBCG method [11], for

w_{k}

, we take

w_{k} = ζ_{k} \frac{g_{k + 1}^{T} y_{k}^{*} {∥ y_{k} ∥}^{2}}{s_{k}^{T} y_{k}},

(16)

where

ζ_{k} = \{\begin{matrix} max {0.9 ζ_{k - 1}, 1.2}, & if α_{k} > 1, \\ min {1.1 ζ_{k - 1}, 1.75}, & otherwise, \end{matrix}

(17)

where

ζ_{0} = 1.5

,

ζ_{k} \in [1.2, 1.75)

.

Now, we estimate

ρ_{k + 1}

. Since

D_{k + 1}

is positive definite, it is easy to know

Δ_{k + 1} > 0

, therefore

ρ_{k + 1} \geq \frac{- w_{k} θ_{2} - g_{k + 1}^{T} y_{k} θ_{1}}{χ} .

(18)

Note that the right-hand side of (18) is

n_{k}

. Combining with (12), (14) and (16), we have

\begin{matrix} n_{k} = \frac{1}{m_{k}} (\frac{w_{k}^{2}}{ϱ_{k}} + \frac{{(g_{k + 1}^{T} y_{k})}^{2}}{s_{k}^{T} y_{k}} - \frac{2 w_{k} (g_{k + 1}^{T} y_{k}) (y_{k}^{T} y_{k}^{*})}{ϱ_{k} (s_{k}^{T} y_{k})}), with m_{k} = 1 - \frac{{(y_{k}^{T} y_{k}^{*})}^{2}}{ϱ_{k} (s_{k}^{T} y_{k})} . \end{matrix}

(19)

According to (14), we know that

m_{k} \geq 1 / 2

.

In order to ensure that (18) holds, we estimate the parameter

ρ_{k + 1}

by taking

ρ_{k + 1} = ζ_{k} max {n_{k}, K},

(20)

where

K = K_{1} {∥ g_{k + 1} ∥}^{2}

,

K_{1} = max \{\frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}}, \frac{4 ∥ y_{k} ∥^{4} {∥ y_{k}^{*} ∥}^{2}}{ϱ_{k} {(s_{k}^{T} y_{k})}^{2}}\}

, and

ρ_{0} \in (0, 1)

. Through the debugging of the algorithm, we find that the numerical experiment effect of

ζ_{k}

using the adaptive value of (17) is better than that of

ζ_{k}

with a fixed value, so

ζ_{k}

also uses (17).

In summary, we find that when the following conditions are satisfied, the search direction

d_{k + 1}

is calculated by (10) and (12).

ξ_{3} \leq \frac{∥ s_{k} ∥^{2}}{∥ g_{k + 1} ∥^{2}},

(21)

ξ_{1} \leq \frac{s_{k}^{T} y_{k}}{∥ s_{k} ∥^{2}} \leq \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} \leq ξ_{2},

(22)

ξ_{1} \leq \frac{ϱ_{k}}{∥ y_{k}^{*} ∥^{2}}, \frac{4 ∥ y_{k} ∥^{4} {∥ y_{k}^{*} ∥}^{2}}{ϱ_{k} {(s_{k}^{T} y_{k})}^{2}} \leq ξ_{2},

(23)

where

ξ_{1}, ξ_{2}, ξ_{3}

are positive constants.

Now, let us prove that

D_{k + 1}

is positive definite.

Lemma 1.

Assuming that the conditions (21)–(23) hold, and

ρ_{k + 1}

is calculated by (20), the matrix

D_{k + 1}

in (10) is positive definite.

Proof of Lemma 1.

Using mathematical induction, it is easy to know

ρ_{0} \in (0, 1), ρ_{k + 1} \geq ξ_{k} K > 0

from (18); for

ρ_{k + 1} > \frac{∥ g_{k + 1} ∥^{2} {∥ y_{k} ∥}^{2}}{s_{k}^{T} y_{k}},

therefore

▵_{k + 1} = | D_{k + 1} | = ρ_{k + 1} χ + w_{k} θ_{2} + g_{k + 1}^{T} y_{k} θ_{1} > 0 .

So the proof is over. □

Situation II:

dim(

Ω_{k + 1}

) = 2.

In this case, the form of the search direction

d_{k + 1}

is as follows:

d_{k + 1} = μ_{k} g_{k + 1} + ν_{k} s_{k},

(24)

where

μ_{k}, ν_{k}

are undetermined parameters. Substituting (24) into (9), we get

min_{(μ, ν)} \{{(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \end{matrix})}^{T} (\begin{matrix} μ \\ ν \end{matrix}) + \frac{1}{2} {(\begin{matrix} μ \\ ν \end{matrix})}^{T} (\begin{matrix} ρ_{k + 1} & g_{k + 1}^{T} y_{k} \\ g_{k + 1}^{T} y_{k} & s_{k}^{T} y_{k} \end{matrix}) (\begin{matrix} μ \\ ν \end{matrix})\} .

(25)

where

ρ_{k + 1} = g_{k + 1}^{T} B_{k + 1} g_{k + 1}

, if it satisfies

▵_{k + 1} = ρ_{k + 1} (s_{k}^{T} y_{k}) - {(g_{k + 1}^{T} y_{k})}^{2} > 0 .

Then the unique solution of the problem (25) is

(\begin{matrix} μ_{k} \\ ν_{k} \end{matrix}) = \frac{1}{▵_{k + 1}} (\begin{matrix} (g_{k + 1}^{T} y_{k}) (g_{k + 1}^{T} s_{k}) - (s_{k}^{T} y_{k}) {∥ g_{k + 1} ∥}^{2} \\ (g_{k + 1}^{T} y_{k}) {∥ g_{k + 1} ∥}^{2} - ρ_{k + 1} (g_{k + 1}^{T} s_{k}) \end{matrix}) .

(26)

Combined with BBCG, there is

ρ_{k + 1} = ζ_{k} \frac{∥ g_{k + 1} ∥^{2} {∥ y_{k} ∥}^{2}}{s_{k}^{T} y_{k}},

(27)

where

ζ_{k}

in (27) is the same as (17).

Obviously, under certain conditions, the HS direction can be regarded as a special case of Formula (24). Taking into account the finite termination of the HS method, in order to make our algorithm have good properties, when the following conditions hold,

ξ_{1} \leq \frac{s_{k}^{T} y_{k}}{∥ s_{k} ∥^{2}},

(28)

\frac{| g_{k + 1}^{T} y_{k} g_{k + 1}^{T} d_{k} |}{d_{k}^{T} y_{k} {∥ g_{k + 1} ∥}^{2}} \leq ξ_{4},

(29)

where

ξ_{4} \in [0, 1)

. We consider

d_{k + 1} = - g_{k + 1} + β_{k} d_{k} .

(30)

In summary, for the case where the dimension of the subspace is 2; if only condition (22) is true,

d_{k + 1}

is calculated by (24). When inequalities (28) and (29) hold,

d_{k + 1}

is calculated from (30).

According to the above analysis,

d_{k + 1}

is calculated by (24) when only condition (22) is true for the case of the 2-subspace dimension. When conditions (28) and (29) hold,

d_{k + 1}

is calculated by (30).

Situation III:

dim(

Ω_{k + 1}

) = 1.

When all conditions (21), (22), and (23) are not valid, we choose the negative gradient as the search direction, i.e.,

d_{k + 1} = - g_{k + 1} .

(31)

2.2. Selection of Initial Step Size

Considering that the selections of initial step sizes will also have an impact on the algorithm, we choose the initial step size selection method of SMCG_NLS [17], which is also a subspace algorithm.

According to [21], we know that

t_{k} = |\frac{2 (f_{k} - f_{k + 1} + g_{k + 1}^{T} s_{k})}{s_{k}^{T} y_{k}} - 1|,

(32)

which indicates how close the objective function is to the quadratic function on the line segment formed between the current iteration point and the previous iteration point. Based on [22], we know that the following condition indicates that the objective function is close to a quadratic function:

t_{k} \leq λ_{1} o r max \{t_{k}, t_{k - 1}\} \leq λ_{2},

(33)

where

0 < λ_{1} < λ_{2}

.

Case I: when the search direction is calculated by Equations (10), or (24) or (30), the initial step size is

α_{k}^{(0)} = 1

.

Case II: when

d_{k + 1} = - g_{k + 1}

, the initial step size is

{\tilde{α}}_{k} = \{\begin{matrix} max {min {α_{k}^{(B B 2)}, ϑ_{m a x}}, ϑ_{m i n}}, & if g_{k + 1}^{T} s_{k} \geq 0, \\ max {min {α_{k}^{(B B 1)}, ϑ_{m a x}}, ϑ_{m i n}}, & if g_{k + 1}^{T} s_{k} < 0, \end{matrix}

(34)

where

α_{k}^{(B B 1)} = \frac{∥ s_{k} ∥^{2}}{s_{k}^{T} y_{k}}, α_{k}^{(B B 2)} = \frac{s_{k}^{T} y_{k}}{∥ y_{k} ∥^{2}}

.

3. The Obtained Algorithm and Descent Property

This section describes the obtained algorithm and its descending properties under two necessary assumptions in detail.

3.1. The Obtained Algorithm

The main content of this section is to introduce our proposed algorithm and give two necessary assumptions. Before introducing the algorithm, we first introduce the restart method we use to restart the proposed algorithm, and then describe the proposed algorithm in detail.

According to [23], set

r_{k} = \frac{2 (f_{k + 1} - f_{k})}{g_{k + 1}^{T} s_{k} + g_{k}^{T} s_{k}},

If

r_{k}

is close to 1, then the one-dimensional line search function is close to the quadratic function. Similar to [21], if in multiple consecutive iterations,

| r_{k} - 1 | \leq ξ_{5}

, we restart the search direction along

- g_{k + 1}

. In addition, we restart our algorithm if the number of consecutive uses of CG directions reaches the MaxRestart threshold.

Now, the details of the three-term subspace conjugate gradient method (TSCG for short) is given as follows:

Algorithm 1: TSCG Alogrithm

Given $x_{0} \in R^{n}, α_{0}^{(0)}, ε > 0, 0 < δ < σ < 1, ξ_{1}, ξ_{2}, ξ_{3}, ξ_{4}, ξ_{5}, ξ_{6}$ , $λ_{1}, λ_{2} \in [0, 1)$ , MaxRestart, MinQuad. Let $C_{0} = f_{0}, Q_{0} = 1,$ IterRestart := 0, Numgrad := 0, IterQuad := 0, Numcongrad := 0, and $k : = 0$ .
When $∥ g_{k} ∥ \leq ε$ , stop; otherwise, let $d_{0} : = - g_{0}$ , Numgrad = 1.
If $k = 0$ , go to step 4. If $d_{k + 1} = - g_{k + 1}$ , via (34) Compute $α_{k}^{(0)}$ ; otherwise, $α_{k}^{(0)} = 1$ .
Calculate the step size $α_{k}$ by (7) and (5).
Update $x_{k + 1}$ with (2). When $∥ g_{k} ∥ \leq ε$ , stop; otherwise, let $I t e r R e s t a r t = I t e r R e s t a r t + 1$ . If $| r_{k} - 1 | \leq ξ_{5}$ or $| f_{k + 1} - f_{k} - (g_{k + 1}^{T} s_{k} + g_{k}^{T} s_{k}) \leq ξ_{6} / 6$ , $I t e r Q u a d = I t e r Q u a d + 1$ ; otherwise, $I t e r Q u a d = 0$ .
Calculate the search direction $d_{k + 1}$ .
(a)
If the conditions (21), (22) and (23) hold, calculate $d_{k + 1}$ by (10) and (12), and then perform step 7;
(b)
If the condition (22) holds, calculate $d_{k + 1}$ by (24) and (26), and then perform step 7;
(c)
If the condition (28) and (29) hold, calculate $d_{k + 1}$ by (30), and then perform step 7;
(d)
Otherwise, calculate $d_{k + 1}$ from (31), and then perform step 7.
Update $Q_{k + 1}$ and $C_{k + 1}$ by (6).
Calculate $η_{k + 1}$ by (8), and set $k : = k + 1$ , and go to step 1.

3.2. Descent Properties of Search Direction

In this subsection, we discuss the descent properties of the given algorithm, in which it will be proved that the proposed algorithm (TSCG) fulfills sufficient descent conditions in all cases. Now, we first introduce some common assumptions on the objective function.

Assumption 1.

The objective function

f : R^{n} \to R

is continuous differentiable and has a lower bound on

R^{n}

.

Assumption 2.

The gradient function g is Lipschitz continuous on the bounded level set

Θ = {x \in R^{n} : f (x) \leq f (x_{0})}

; that is, there exists constant

L > 0

, such that

∥ g (x) - g (y) ∥ \leq L ∥ x - y ∥, \forall x, y \in Θ .

That is:

∥ y_{k} ∥ \leq L ∥ s_{k} ∥

.

Lemma 2.

If the search direction

d_{k + 1}

is generated by (10) or (24), then

g_{k + 1}^{T} d_{k + 1} \leq - \frac{∥ g_{k + 1} ∥^{4}}{ρ_{k + 1}},

(35)

holds.

Proof of Lemma 2.

We only need to discuss the situation of search direction in relation to

ρ_{k + 1}

.

Case I: if

d_{k + 1}

is generated by (24), the proof is similar to [22], so the proof process is omitted.

Case II: when

d_{k + 1}

is calculated by (10) and (12), we have

\begin{matrix} g_{k + 1}^{T} d_{k + 1} & = {(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix})}^{T} (\begin{matrix} μ_{k} \\ ν_{k} \\ γ_{k} \end{matrix}) \\ = - \frac{1}{▵_{k + 1}} {(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix})}^{T} (\begin{matrix} χ & θ_{1} & θ_{2} \\ θ_{1} & θ & θ_{3} \\ θ_{2} & θ_{3} & ω \end{matrix}) (\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ g_{k + 1}^{T} s_{k} \\ g_{k + 1}^{T} y_{k}^{*} \end{matrix}) \\ = - \frac{∥ g_{k + 1} ∥^{4}}{▵_{k + 1}} φ (x, y), \end{matrix}

where

φ (x, y)

is the binary quadratic function of variables

\frac{g_{k + 1}^{T} y_{k}^{*}}{∥ g_{k + 1} ∥^{2}}

and

\frac{g_{k + 1}^{T} s_{k}}{∥ g_{k + 1} ∥^{2}}

represented as x and y, and its simplified form is expressed as

φ (x, y) = ω x^{2} + 2 θ_{3} x y + θ y^{2} + 2 θ_{2} x + 2 θ_{1} y + χ .

From Lemma 1, it is easy to get

ω > 0

,

ω θ - θ_{3}^{2} = ▵_{k + 1} ρ_{k + 1} > 0

; that is

φ {(x, y)}_{m i n} = \frac{▵_{k + 1}}{ρ_{k + 1}} .

Therefore, we have

g_{k + 1}^{T} d_{k + 1} \leq - \frac{∥ g_{k + 1} ∥^{4}}{▵_{k + 1}} φ {(x, y)}_{m i n} \leq - \frac{∥ g_{k + 1} ∥^{4}}{ρ_{k + 1}} .

Thus, that is the end of the proof. □

Lemma 3.

Assume that

d_{k + 1}

is generated by the algorithm TSCG. there exists a constant

c_{1} > 0

, such that

g_{k + 1}^{T} d_{k + 1} \leq - c_{1} {∥ g_{k + 1} ∥}^{2} .

(36)

Proof of Lemma 3.

As for the four forms of the constituent directions, we discuss them separately.

Case I: if

d_{k + 1} = - g_{k + 1}

, let

c_{1} = \frac{1}{2}

, then (36) holds.

Case II: when the search direction is binomial, namely, we first discuss the situation given by (30). When

d_{k + 1}

is determined by (30), combined with (28) and (29), for

β_{k} = β_{k}^{H S}

, we have

\begin{matrix} g_{k + 1}^{T} d_{k + 1} & = - ∥ g_{k + 1} ∥^{2} + β_{k} g_{k + 1}^{T} d_{k} \\ \leq - ∥ g_{k + 1} ∥^{2} + \frac{| (g_{k + 1}^{T} y_{k}) (g_{k + 1}^{T} d_{k}) |}{d_{k}^{T} y_{k}} \\ \leq - ∥ g_{k + 1} ∥^{2} + ξ_{4} {∥ g_{k + 1} ∥}^{2} \\ \leq - (1 - ξ_{4}) {∥ g_{k + 1} ∥}^{2} . \end{matrix}

Case III: now, we discuss another case where the search direction is a binomial; that is, the situation where the search direction is generated by (24). Combining (22) and (27), obviously

ρ_{k + 1} \leq ζ_{k} \frac{∥ g_{k + 1} ∥^{2} {∥ y_{k} ∥}^{2}}{s_{k}^{T} y_{k}} \leq 2 ξ_{2} {∥ g_{k + 1} ∥}^{2} .

In combination with the above formula and (35), it can be obtained

g_{k + 1}^{T} d_{k + 1} \leq - \frac{1}{2 ξ_{2}} {∥ g_{k + 1} ∥}^{2} .

Case IV: when the search direction is trinomial; that is,

d_{k + 1}

is given by (10) and (12). Consider Lemma 2 and use (20), (21), and

χ > 0

, we first prove that

ρ_{k + 1}

has an upper bound:

\begin{matrix} | n_{k} | & = |\frac{(\frac{w_{k}^{2}}{ϱ_{k}} + \frac{{(g_{k + 1}^{T} y_{k})}^{2}}{s_{k}^{T} y_{k}} - \frac{2 w_{k} (g_{k + 1}^{T} y_{k}) (y_{k}^{T} y_{k}^{*})}{ϱ_{k} (s_{k}^{T} y_{k})})}{m_{k}}| \\ \leq \frac{|(\frac{4 ∥ y_{k} ∥^{4} {∥ y_{k}^{*} ∥}^{2}}{ϱ_{k} {(s_{k}^{T} y_{k})}^{2}} + \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}}) {∥ g_{k + 1} ∥}^{2} + 2 \frac{w_{k}}{\sqrt{ϱ_{k}}} \frac{g_{k + 1}^{T} y_{k}}{\sqrt{s_{k}^{T} y_{k}}} \frac{y_{k}^{T} y_{k}^{*}}{\sqrt{ϱ_{k} s_{k}^{T} y_{k}}}|}{m_{k}} \\ \leq \frac{(2 K_{1} {∥ g_{k + 1} ∥}^{2} + 2 \frac{| w_{k} |}{\sqrt{ϱ_{k}}} \frac{| g_{k + 1}^{T} y_{k} |}{\sqrt{s_{k}^{T} y_{k}}})}{m_{k}} \\ \leq \frac{(2 K + 2 \sqrt{K} \sqrt{K})}{m_{k}} \\ \leq 8 K . \end{matrix}

The

ρ_{k}

in the root of the above second inequality is

ϱ_{k}

.

Combining the conditions (22) and (23), it is easy to know that

K_{1}

in (20) has an upper bound, namely

K_{1} = max \{\frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}}, 4 \frac{∥ y_{k} ∥^{4} {∥ y_{k}^{*} ∥}^{2}}{ϱ_{k} {(s_{k}^{T} y_{k})}^{2}}\} \leq ξ_{2} .

There is

ρ_{k + 1} = ζ_{k} max {n_{k}, K} \leq 16 K = 16 K_{1} ∥ g_{k + 1} ∥^{2} \leq 16 ξ_{2} {∥ g_{k + 1} ∥}^{2} .

Finally, according to Lemma 2, we have

g_{k + 1}^{T} d_{k + 1} \leq - \frac{∥ g_{k + 1} ∥^{4}}{ρ_{k + 1}} \leq - \frac{1}{16 ξ_{2}} {∥ g_{k + 1} ∥}^{2} .

In summary, the value of

c_{1}

, satisfying (36), is

c_{1} = min \{\frac{1}{2}, \frac{1}{2 ξ_{2}}, (1 - ξ_{4}), \frac{1}{16 ξ_{2}}\} .

Then the proof is complete. □

Lemma 4.

If the search direction

d_{k + 1}

is calculated by TSCG, then there exists a constant

c_{2} > 0

, such that

∥ d_{k + 1} ∥ \leq c_{2} ∥ g_{k + 1} ∥ .

(37)

Proof of Lemma 4.

Similar to Lemma 3, we discuss four cases of search directions, respectively.

Case I: if

d_{k + 1} = - g_{k + 1}

, let

c_{2} = 1

, then (37) is true.

Case II: if

d_{k + 1}

is calculated by (30), according to Assumption 2 and the condition (28), we have

\begin{matrix} ∥ d_{k + 1} ∥ & = ∥ - g_{k + 1} + β_{k}^{H S} d_{k} ∥ \\ \leq ∥ g_{k + 1} ∥ + \frac{∥ g_{k + 1} ∥ ∥ y_{k} ∥ ∥ d_{k} ∥}{d_{k}^{T} y_{k}} \\ \leq (1 + \frac{L}{ξ_{1}}) ∥ g_{k + 1} ∥ . \end{matrix}

Case III: if

d_{k + 1}

is calculated by (24) and (26), then, combining (22) and (27), and the Cauchy inequality, we deduce

\begin{matrix} \begin{matrix} ▵_{k + 1} & = ρ_{k + 1} (s_{k}^{T} y_{k}) - {(g_{k + 1}^{T} y_{k})}^{2} \\ \geq ξ_{1} {∥ s_{k} ∥}^{2} (ρ_{k + 1} - \frac{{(g_{k + 1}^{T} y_{k})}^{2}}{s_{k}^{T} y_{k}}) \\ \geq \frac{1}{5} ξ_{1} {∥ s_{k} ∥}^{2} \frac{∥ g_{k + 1} ∥^{2} {∥ y_{k} ∥}^{2}}{s_{k}^{T} y_{k}} . \end{matrix} \end{matrix}

(38)

Combining the above equation, (27), the Cauchy inequality, and the triangle inequality, we can get

\begin{matrix} ∥ d_{k + 1} ∥ & = ∥ μ_{k} g_{k + 1} + ν_{k} s_{k} ∥ \\ \leq \frac{1}{▵_{k + 1}} (| g_{k + 1}^{T} y_{k} | | g_{k + 1}^{T} s_{k} | ∥ g_{k + 1} ∥ + | g_{k + 1}^{T} y_{k} | ∥ g_{k + 1} ∥^{2} + | ρ_{k + 1} (g_{k + 1}^{T} s_{k}) | ∥ s_{k} ∥) \\ \leq \frac{1}{▵_{k + 1}} (2 ∥ s_{k} ∥ ∥ y_{k} ∥ + \frac{ρ_{k + 1} {∥ s_{k} ∥}^{2}}{∥ g_{k + 1} ∥^{2}}) {∥ g_{k + 1} ∥}^{3} \\ \leq \frac{5 s_{k}^{T} y_{k}}{ξ_{1} ∥ s_{k} ∥^{2} {∥ y_{k} ∥}^{2}} (2 ∥ s_{k} ∥ ∥ y_{k} ∥ + \frac{ρ_{k + 1} {∥ s_{k} ∥}^{2}}{∥ g_{k + 1} ∥^{2}}) ∥ g_{k + 1} ∥ \\ \leq (\frac{10 s_{k}^{T} y_{k}}{ξ_{1} ∥ s_{k} ∥ ∥ y_{k} ∥} + \frac{10}{ξ_{1}}) ∥ g_{k + 1} ∥ \\ \leq \frac{20}{ξ_{1}} ∥ g_{k + 1} ∥ . \end{matrix}

Case IV: when the search direction is three; that is, the search direction is calculated by (10) and (12). Similar to Case III, let us first derive the lower bound of

▵_{k + 1}

. According to (16), (18) and (20), we have

\begin{matrix} ▵_{k + 1} & = ρ_{k + 1} χ + w_{k} θ_{2} + g_{k + 1}^{T} y_{k} θ_{1} \\ = χ (ρ_{k + 1} - \frac{- w_{k} θ_{2} - g_{k + 1}^{T} y_{k} θ_{1}}{χ}) \\ = χ (ρ_{k + 1} - n_{k}) \\ \geq χ (1.2 max {n_{k}, K} - n_{k}) \\ \geq \frac{1}{5} χ K . \end{matrix}

Let us write

χ_{1} = ϱ_{k} (s_{k}^{T} y_{k})

, and combine that with

m_{k} \geq \frac{1}{2}

, we have

χ = χ_{1} m_{k} \geq \frac{1}{2} χ_{1} > 0

, therefore

▵_{k + 1} \geq \frac{1}{10} χ_{1} K .

Then

\begin{matrix} ∥ d_{k + 1} ∥ & = ∥ μ_{k} g_{k + 1} + ν_{k} s_{k} + γ_{k} y_{k}^{*} ∥ \\ = \frac{1}{▵_{k + 1}} {(\begin{matrix} ∥ g_{k + 1} ∥^{2} \\ | g_{k + 1}^{T} s_{k} | \\ | g_{k + 1}^{T} y_{k}^{*} | \end{matrix})}^{T} (\begin{matrix} | χ | & | θ_{1} | & | θ_{2} | \\ | θ_{1} | & | θ | & | θ_{3} | \\ | θ_{2} | & | θ_{3} | & | ω | \end{matrix}) (\begin{matrix} ∥ g_{k + 1} ∥ \\ ∥ s_{k} ∥ \\ ∥ y_{k}^{*} ∥ \end{matrix}) \\ \leq \frac{∥ g_{k + 1} ∥}{▵_{k + 1}} (χ_{1} ∥ g_{k + 1} ∥^{2} + 4 a_{k} \sqrt{χ_{1} K_{1}} ∥ g_{k + 1} ∥^{2} + 2 b_{k} \sqrt{χ_{1}} (K + ρ_{k + 1}) + c_{k} ρ_{k + 1}) \\ \leq \frac{10 ∥ g_{k + 1} ∥}{χ_{1} K} (χ_{1} ∥ g_{k + 1} ∥^{2} + 4 a_{k} \sqrt{χ_{1} K_{1}} ∥ g_{k + 1} ∥^{2} + 2 b_{k} \sqrt{χ_{1}} (K + ρ_{k + 1}) + c_{k} ρ_{k + 1}) \\ = ∥ g_{k + 1} ∥ (\frac{10}{K_{1}} + \frac{40}{\sqrt{K_{1}}} \frac{a_{k}}{\sqrt{χ_{1}}} + 340 \times \frac{b_{k}}{\sqrt{χ_{1}}} + 160 \times \frac{c_{k}}{χ_{1}}), \end{matrix}

where

a_{k} = \sqrt{ϱ_{k}} ∥ s_{k} ∥ + \sqrt{s_{k}^{T} y_{k}} ∥ y_{k}^{*} ∥, b_{k} = ∥ s_{k} ∥ ∥ y_{k}^{*} ∥, c_{k} = ϱ_{k} ∥ s_{k} ∥^{2} + s_{k}^{T} y_{k} {∥ y_{k}^{*} ∥}^{2}

. From (20), (22) and (23), through calculation and simplification, we obtain

ξ_{1} \leq \frac{∥ y_{k} ∥^{2}}{s_{k}^{T} y_{k}} \leq K_{1},

\frac{a_{k}}{\sqrt{χ_{1}}} \leq \frac{2}{\sqrt{ξ_{1}}}, \frac{b_{k}}{\sqrt{χ_{1}}} \leq \frac{1}{ξ_{1}}, \frac{c_{k}}{χ_{1}} \leq \frac{2}{ξ_{1}} .

According to the above results, it can be further deduced

∥ d_{k + 1} ∥ \leq \frac{750}{ξ_{1}} ∥ g_{k + 1} ∥ .

According to the four cases of the above analyses, the value of

c_{2}

that satisfies (38) is

c_{2} = m a x \{1, 1 + \frac{L}{ξ_{1}}, \frac{20}{ξ_{1}}, \frac{750}{ξ_{1}}\},

The proof is ended. □

4. Convergence Analysis

The global convergence of the algorithm for general functions is proven in this section.

Lemma 5.

Suppose

α_{k}

is generated by line search (7) and (5), and satisfies Assumption 2, then

α_{k} \geq \frac{(1 - σ) | g_{k}^{T} d_{k} |}{L ∥ d_{k} ∥^{2}} .

(39)

Proof of Lemma 5.

By line search conditions (7) and (5), we have

(σ - 1) g_{k}^{T} d_{k} \leq {(g_{k + 1} - g_{k})}^{T} d_{k} = y_{k}^{T} d_{k} \leq ∥ y_{k} ∥ ∥ d_{k} ∥ \leq α_{k} L {∥ d_{k} ∥}^{2} .

We notice that

σ < 1

,

g_{k}^{T} d_{k} < 0

, and (39) holds immediately. □

Theorem 1.

Suppose Assumption 1 and Assumption 2 are satisfied, and sequence

x_{k}

is generated by algorithm TSCG, then

\underset{k \to \infty}{lim inf} ∥ g_{k} ∥ = 0 .

(40)

Proof of Theorem 1.

According to (5) and (39), there is

\begin{matrix} \begin{matrix} δ α_{k} g_{k}^{T} d_{k} & = - \frac{(1 - σ) δ}{L} {(\frac{g_{k}^{T} d_{k}}{∥ d_{k} ∥})}^{2} \\ \leq - \frac{(1 - σ) δ c_{3}^{2}}{L c_{4}^{2}} {∥ g_{k} ∥}^{2} . \end{matrix} \end{matrix}

(41)

Combining (41) and (7), we get

- \frac{(1 - σ) δ c_{3}^{2}}{L c_{4}^{2}} {∥ g_{k} ∥}^{2} \leq f_{k} + η_{k} - f_{k + 1} .

(42)

Clearly

f_{k} + η_{k} - f_{k + 1} \geq 0 .

(43)

The above equation with

lim_{k \to \infty} η_{k} = 0

is equivalent to

\underset{k \to \infty}{lim inf} (f_{k} - f_{k + 1}) \geq \underset{k \to \infty}{lim inf} (f_{k} + η_{k} - f_{k + 1}) + \underset{k \to \infty}{lim inf} (- η_{k}) = l \geq 0 .

Suppose

l > 0

, for

ε = min \{1, \frac{l}{2}\} > 0

, for any

k > N

, there exists

N > 0

, then

f_{k + 1} < f_{k} - \frac{ε}{k} .

(44)

Combining with (44) and

\sum_{k = N + 1}^{+ \infty} \frac{1}{k} = + \infty

, then

lim_{k \to \infty} f_{k} = - \infty

. This contradicts Assumption 1 that

f_{k}

on

R^{n}

has a lower bound. Therefore,

\underset{k \to \infty}{lim inf} (f_{k} + η_{k} - f_{k + 1}) = l = 0 .

In combination with the above equation, (42) and

lim_{k \to \infty} η_{k} = 0

, then (40) is true. The proof is complete. □

5. Numerical Results

In this section, we compare the numerical performance of the TSCG algorithm with SMCG_NLS [17] and SMCG_Conic [12] algorithms, both of which are subspace minimization algorithms, through numerical experiments to prove the effectiveness of the proposed TSCG algorithm. Performance profiles of Dolan and Moré [24] were used to test the performance of the method. Our test functions were derived from 67 functions in [25], as shown in Table 1. It was programmed and run on a Windows 10 PC with a 1.80-GHz CPU and 16.00 GB memory, 64-bit operating system. We set the termination criteria as:

∥ g_{k} ∥ \leq ε

, or when the number of iterations of the program exceeded 200,000, and exited when one of them was true. The dimensions of the variables of the test function were 10,000 and 12,000 respectively.

Table 1. The test problems.

The following shows the selection of parameters and some tags and numerical experiments. The initial step size of the first iteration in this paper uses the adaptive strategy of [26].

α_{0}^{(0)} = \{\begin{matrix} 1.0, & if ∥ x_{0} ∥_{\infty} < 10^{- 30} a n d {∥ f_{0} ∥}_{\infty} < 10^{- 30}, \\ 2 \frac{| f_{0} |}{∥ g_{0} ∥}, & if ∥ x_{0} ∥_{\infty} < 10^{- 30} a n d {∥ f_{0} ∥}_{\infty} \geq 10^{- 30}, \\ min \{1.0, \frac{∥ x_{0} ∥_{\infty}}{∥ g_{0} ∥_{\infty}}\}, & if ∥ x_{0} ∥_{\infty} \geq 10^{- 30} a n d {∥ g_{0} ∥}_{\infty} < 10^{7}, \\ min \{1.0, max \{\frac{∥ x_{0} ∥_{\infty}}{∥ g_{0} ∥_{\infty}}, \frac{1}{∥ g_{0} ∥_{\infty}}\}\}, & if ∥ x_{0} ∥_{\infty} \geq 10^{- 30} a n d {∥ g_{0} ∥}_{\infty} \geq 10^{7}, \end{matrix}

where

{∥ \cdot ∥}_{\infty}

represents the infinite norm.

The parameters of SMCG_NLS and SMCG_Conic algorithms are selected according to the values selected in the original paper. The parameters of the TSCG algorithm proposed by us are selected as follows.

ε = 10^{- 6}, δ = 0.001, σ = 0.9999, ξ_{1} = 10^{- 7}, ξ_{2} = 10^{6}, ξ_{3} = 10^{- 4}, ξ_{4} = 0.875,

ξ_{5} = 10^{- 8}, ξ_{6} = 10^{- 12}, λ_{1} = 10^{- 5}, λ_{2} = 0.06, ϑ_{m i n} = 10^{- 30}, ϑ_{m a x} = 10^{30} .

The relevant tags of this article are as follows:

Ni: the number of iterations.
Nf: the number of function evaluations.
Ng: the number of gradient evaluations.
CPU: the running time of the algorithm (in seconds).

Now, we compare the algorithm TSCG in this paper with SMCG_NLS [17] and SMCG_Conic [12] in numerical diagrams. The performance comparison diagrams of Ni, Ng, Nf and CPU correspond to Figure 1, Figure 2, Figure 3 and Figure 4 respectively. According to Figure 1 and Figure 2, it is found that the robustness and stability of TSCG are significantly better than that of SMCG_NLS and SMCG_Conic in terms of the number of iterations and the total number of gradient calculations. Overall, SMCG_NLS has better robustness and stability than SMCG_Conic. From Figure 3, we know that both TSCG and SMCG_NLS are superior to SMCG_Conic, and TSCG begins to be better than SMCG_NLS when

τ \geq 14.3231

, even when factor

τ \leq 14.3231

, the total calculated by the function TSCG does not perform as well as SMCG_NLS. Figure 4 shows that TSCG and SMCG_NLS are better than SMCG_Conic in terms of the robustness and stability of running time. In general, TSCG is almost better than SMCG_NLS in terms of robustness and stability, only when factor

2.6328 \leq τ \leq 7.6472

, TSCG is inferior to SMCG_NLS.

Figure 1. Performance profiles of the number of iterations (Ni).

Figure 2. Performance profiles of the gradient (Ng).

Figure 3. Performance profiles of the function (Nf).

Figure 4. Performance profiles of CPU time (CPU).

6. Conclusions and Prospect

In order to obtain a more efficient and robust conjugate gradient algorithm for solving the unconstrained optimization problem, by constructing a new three-dimensional subspace

Ω_{k + 1} = S p a n {g_{k + 1}, s_{k}, y_{k}^{*}}

, and solving the sub-problem of a quadratic approximate model of the objective function in the given subspace, we obtained a new three-term conjugate gradient method (TSCG). The global convergence of the TSCG method for general functions is established under mild conditions. Numerical results show that TSCG has better performance than SMCG_NLS and SMCG_Conic, both of which are subspace algorithms, for a given test set.

As for the research of the subspace algorithm, the main content of our future work is to continue to study the estimation of terms, including Hessian matrix, to discuss whether it can be extended to the constrained optimization algorithm, and consider the application of the algorithm to image restoration, image segmentation, and path planning of engineering problems.

Author Contributions

Conceptualization, J.H. and S.Y.; methodology, J.Y.; software and validation, J.Y., G.W., visualization and formal analysis, J.Y.; writing—original draft preparation, J.Y.; supervision, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Science Foundation of China no. 71862003, Natural Science Foundation of Guangxi Province (CN) no. 2020GXNSFAA159014, the Program for the Innovative Team of Guangxi University of Finance and Economics, and the Special Funds for Local Science and Technology Development guided by the central government, grant number ZY20198003.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hestenes, M.R.; Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
Polyak, B.T. The conjugate gradient method in extremal problems. JUssr Comput. Math. Math. Phys. 1969, 9, 94–112. [Google Scholar] [CrossRef]
Liu, Y.; Storey, C. Efficient generalized conjugate gradient algorithms, part 1: Theory. J. Optim. Theory Appl. 1991, 69, 129–137. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. Siam J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
Fletcher, R. Volume 1 Unconstrained Optimization. In Practical Methods of Optimization; Wiley-Interscience: New York, NY, USA, 1980; p. 120. [Google Scholar]
Zhang, H.; Hager, W.W. A non-monotone line search technique and its application to unconstrained optimization. Soc. Ind. Appl. Math. J. Optim. 2004, 14, 1043–1056. [Google Scholar]
Liu, H.; Liu, Z. An efficient Barzilai–Borwein conjugate gradient method for unconstrained optimization. J. Optim. Theory Appl. 2019, 180, 879–906. [Google Scholar] [CrossRef]
Yuan, Y.X.; Stoer, J. A subspace study on conjugate gradient algorithms. ZAMM-J. Appl. Math. Mech. FüR Angew. Math. Und Mech. 1995, 75, 69–77. [Google Scholar] [CrossRef]
Dai, Y.H.; Kou, C.X. A Barzilai–Borwein conjugate gradient method. Sci. China Math. 2016, 59, 1511–1524. [Google Scholar] [CrossRef]
Barzilai, J.; Borwein, J.M. Two-Point Step Size Gradient Methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
Li, Y.; Liu, Z.; Hongwei, L. A subspace minimization conjugate gradient method based on conic model for unconstrained optimization. Comput. Appl. Math. 2019, 38, 16. [Google Scholar] [CrossRef]
Wang, T.; Liu, Z.; Liu, H. A new subspace minimization conjugate gradient method based on tensor model for unconstrained optimization. Int. J. Comput. Math. 2019, 96, 1924–1942. [Google Scholar] [CrossRef]
Zhao, T.; Liu, H.; Liu, Z. New subspace minimization conjugate gradient methods based on regularization model for unconstrained optimization. Numer. Algorithms 2020, 87, 1501–1534. [Google Scholar] [CrossRef]
Andrei, N. An accelerated subspace minimization three-term conjugate gradient algorithm for unconstrained optimization. Numer. Algorithms 2014, 65, 859–874. [Google Scholar] [CrossRef]
Yang, Y.; Chen, Y.; Lu, Y. A subspace conjugate gradient algorithm for large-scale unconstrained optimization. Numer. Algorithms 2017, 76, 813–828. [Google Scholar] [CrossRef]
Li, M.; Liu, H.; Liu, Z. A new subspace minimization conjugate gradient method with non-monotone line search for unconstrained optimization. Numer. Algorithms 2018, 79, 195–219. [Google Scholar] [CrossRef]
Yao, S.; Wu, Y.; Yang, J.; Xu, J. A Three-Term Gradient Descent Method with Subspace Techniques. Math. Probl. Eng. 2021, 2021, 8867309. [Google Scholar] [CrossRef]
Yuan, Y. A review on subspace methods for nonlinear optimization. Proc. Int. Congr. Math. 2014, 807–827. [Google Scholar]
Wei, Z.; Yao, S.; Liu, L. The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006, 183, 1341–1350. [Google Scholar] [CrossRef]
Yuan, Y. A modified BFGS algorithm for unconstrained optimization. IMA J. Numer. Anal. 1991, 11, 325–332. [Google Scholar] [CrossRef]
Dai, Y.; Yuan, J.; Yuan, Y.X. Modified two-point stepsize gradient methods for unconstrained optimization. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
Dai, Y.H.; Kou, C.X. A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search. Soc. Ind. Appl. Math. J. Optim. 2013, 23, 296–320. [Google Scholar] [CrossRef] [Green Version]
Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Performance profiles of the number of iterations (Ni).

Figure 2. Performance profiles of the gradient (Ng).

Figure 3. Performance profiles of the function (Nf).

Figure 4. Performance profiles of CPU time (CPU).

Table 1. The test problems.

No.	Test Problems	No.	Test Problems
1	Extended Freudenstein & Roth function	35	NONDIA function (CUTE)
2	Extended Trigonometric function	36	DQDRTIC function (CUTE)
3	Extended Rosenbrock function	37	EG2 function (CUTE)
4	Generalized Rosenbrock function	38	DIXMAANA - DIXMAANL functions
5	Extended White & Holst function	39	Partial Perturbed Quadratic function
6	Extended Beale function	40	Broyden Tridiagonal function
7	Extended Penalty function	41	Almost Perturbed Quadratic function
8	Perturbed Quadratic function	42	Perturbed Tridiagonal Quadratic function
9	Diagonal 1 function	43	Staircase 1 function
10	Diagonal 3 function	44	Staircase 2 function
11	Extended Tridiagonal 1 function	45	LIARWHD function (CUTE)
12	Full Hessian FH3 function	46	POWER function (CUTE)
13	Generalized Tridiagonal 2 function	47	ENGVAL1 function (CUTE)
14	Diagonal 5 function	48	CRAGGLVY function (CUTE)
15	Extended Himmelblau function	49	EDENSCH function (CUTE)
16	Generalized White & Holst function	50	CUBE function (CUTE)
17	Extended PSC1 function	51	BDEXP function (CUTE)
18	Extended Powell function	52	NONSCOMP function (CUTE)
19	Full Hessian FH2 function	53	VARDIM function (CUTE)
20	Extended Maratos function	54	SINQUAD function (CUTE)
21	Extended Cliff function	55	Extended DENSCHNB function (CUTE)
22	Perturbed quadratic diagonal function 2	56	Extended DENSCHNF function (CUTE)
23	Extended Wood function	57	LIARWHD function (CUTE)
24	Extended Hiebert function	58	COSINE function (CUTE)
25	Quadratic QF1 function	59	SINE function:83
26	Extended quadratic penalty QP1 function	60	Generalized Quartic function
27	Extended quadratic penalty QP2 function	61	Diagonal 7 function
28	Quadratic QF2 function	62	Diagonal 8 function
29	Extended quadratic exponential EP1 function	63	Extended TET function:(Three exponential terms)
30	Extended Tridiagonal 2 function	64	SINCOS function
31	FLETCHCR function (CUTE)	65	Diagonal 9 function
32	TRIDIA function (CUTE)	66	HIMMELBG function (CUTE)
33	ARGLINB function (CUTE)	67	HIMMELH function (CUTE)
34	ARWHEAD function (CUTE)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Class of Three-Dimensional Subspace Conjugate Gradient Algorithms for Unconstrained Optimization

Abstract

1. Introduction

2. Search Direction and Step Size

2.1. Direction Choice Model

2.2. Selection of Initial Step Size

3. The Obtained Algorithm and Descent Property

3.1. The Obtained Algorithm

3.2. Descent Properties of Search Direction

4. Convergence Analysis

5. Numerical Results

6. Conclusions and Prospect

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics