Research on Three-Dimensional Extension of Barzilai-Borwein-like Method

Tianji Wang; Qingdao Huang

doi:10.3390/math13020215

Abstract

The Barzilai-Borwein (BB) method usually uses BB stepsize for iteration so as to eliminate the line search step in the steepest descent method. In this paper, we modify the BB stepsize and extend it to solve the optimization problems of three-dimensional quadratic functions. The discussion is divided into two cases. Firstly, we study the case where the coefficient matrix of the quadratic term of quadratic function is a special third-order diagonal matrix and prove that using the new modified stepsize, this case is R-superlinearly convergent. In addition to that, we extend it to n-dimensional case and prove the rate of convergence is R-linear. Secondly, we analyze that the coefficient matrix of the quadratic term of quadratic function is a third-order asymmetric matrix, that is, when the matrix has a double characteristic root and prove the global convergence of this case. The results of numerical experiments show that the modified method is effective for the above two cases.

Keywords:

unconstrained optimization; quadratic functions; Barzilai-Borwein stepsize; R-superlinear convergence; global convergence

MSC:

49K35; 65K10

1. Introduction

In this paper, we consider the unconstrained optimization problem of minimizing the quadratic function,

\min_{x \in R^{n}} f (x) = \frac{1}{2} x^{T} A x - b^{T} x,

(1)

where

A \in R^{n \times n}

is the coefficient matrix of the quadratic term,

b \in R^{n}

. In order to solve (1), common optimization methods usually take the following iterative approach:

x_{k + 1} = x_{k} - α_{k} g_{k},

(2)

where

g_{k} = \nabla f (x_{k})

,

α_{k} > 0

is called stepsize. Different method has different definition of the stepsize, so the studies on stepsize are diverse. The most common method is the classical steepest descent method [1], whose stepsize is called Cauchy stepsize,

α_{k}^{S D} = \arg \min_{α > 0} f (x_{k} - α g_{k}),

(3)

this method to find the stepsize is also called accurate one-dimensional line search, and Forsythe proved the rate of convergence of the classical steepest descent is linear in [2]. Although the stepsize

α_{k}^{S D}

is effective, the classical steepest descent method will not work well when the condition number of A is large, see [3] for details.

In order to ensure the convergence speed and reduce the amount of computation, Borwein and Barzilai [4] proposed a new stepsize, they turned the iterative formula into

x_{k + 1} = x_{k} - D_{k} g_{k},

(4)

where

D_{k} = α_{k} I

, I is identity matrix. It is similar to the quasi-Newton method [5],

D_{k}^{- 1}

can be regarded as an approximate Hessian matrix of f at

x_{k}

. And then, in order for

D_{k}

to have quasi-Newton property, they chose

α_{k}

so that

D_{k}

meets the following condition:

D_{k} = \arg \min_{D = α I} ∥ D^{- 1} s_{k - 1} - y_{k - 1} ∥,

(5)

and they have

α_{k}^{B B 1} = \frac{s_{k - 1}^{T} s_{k - 1}}{s_{k - 1}^{T} y_{k - 1}},

(6)

where

s_{k - 1} = x_{k} - x_{k - 1}

,

y_{k - 1} = g_{k} - g_{k - 1}

and

∥ \cdot ∥

denotes the Euclidean norm. Additionally, in another way to choose the stepsize

α_{k}

, they let

D_{k}

satisfy

D_{k} = \arg \min_{D = α I} ∥ s_{k - 1} - D y_{k - 1} ∥,

(7)

and then they have another stepsize

α_{k}^{B B 2} = \frac{s_{k - 1}^{T} y_{k - 1}}{y_{k - 1}^{T} y_{k - 1}} .

(8)

As we can see, if

s_{k - 1}^{T} y_{k - 1} > 0

, then

α_{k}^{B B 1} \geq α_{k}^{B B 2}

, so

α_{k}^{B B 1}

is a long stepsize and

α_{k}^{B B 2}

is a short stepsize. So

α_{k}^{B B 1}

will perform better than

α_{k}^{B B 2}

in solving some optimization problems, see [6,7] for details. And the methods that use

α_{k}^{B B 1}

or

α_{k}^{B B 2}

as stepsize are collectively referred to as BB methods.

In recent years, there has been a lot of research on the convergence and stepsize modification of the BB methods [8,9,10]. From previous studies, it can be found that the convergence rate of BB methods are usually R-superlinear and R-linear. For example, in [4], Barzilai and Borwein proved R-superlinear convergence of their method with the stepsize

α_{k}^{B B 2}

for solving two-dimensional strictly convex quadratics. And Dai and Liao [11] proved R-linear convergence of the BB method for n-dimensional strictly convex quadratics. Now we give the definition of these two rates of convergence as follows.

Definition 1.

Set

R_{1} = \underset{k \to \infty}{\lim \sup} {∥ x_{k} - x^{*} ∥}^{1 / k}

(9)

According to the above formula, the R-convergence rate can be divided into two cases:

(i) When

R_{1} = 0

, the sequence of iterated points

{x_{k}}

is said to have R-superlinear convergence rate;

(ii) When

0 < R_{1} < 1

, the sequence of iterated points

{x_{k}}

is said to have R-linear convergence rate.

In addition to solving quadratics, the BB method can also solve nonlinear optimization problems. Raydan [12] proposed a global Barzilai-Borwein method for unconstrained optimization problems by combining with the nonmonotone line search proposed by Grippo et al. [13]. Dai and Fletcher [14] developed projected BB methods for solving large-scale box-constrained quadratic programming. Additionally, Huang and Liu [15] extended the projected BB methods by using smoothing techniques, and they modified it to solve non-Lipschitz optimization problems. In [16], Dai considered alternating the Cauchy stepsize and the BB stepsize, and proposed an alternate step gradient method. And in [17], Zhou et al. proposed an Adaptive Barzilai-Borwein (ABB) method which alternated

α_{k}^{B B 1}

and

α_{k}^{B B 2}

.

In addition, the relationship between BB stepsizes and the spectrum of the Hessian matrices of the objective function has also attracted wide attention. Based on the ABB method in [17], Frassoldati et al. [18] tried to use

α_{k}^{B B 1}

close to the reciprocal of the minimum eigenvalue of the Hessian matrix. Their first implementation of this idea was denoted by ABBmin1 and in order to better the iteration effect, they proposed another method, denoted by ABBmin2. De Asmundis et al. [19] used the spectral property of the stepsize in [20] to propose an SDC method, the SDC indicated that the Cauchy stepsize

α_{k}^{S D}

was alternated with the Constant one. In [21], the Broyden class of quasi-Newton method approximates the inverse of the Hessian matrix by

H_{k}^{τ} = τ H_{k}^{B F G S} + (1 - τ) H_{k}^{D F P},

(10)

where

τ \in [0, 1]

,

H_{k}^{B F G S}

and

H_{k}^{D F P}

are the BFGS and DFP matrices satisfying the formula

H_{k} y_{k - 1} = s_{k - 1}

, respectively. In the quasi-Newton method, these are the two most common corrections for

H_{k}

. Among them, the DFP correction was first proposed by Daviden [22] and later explained and developed by Fletcher and Powell [23], while the BFGS correction was summarized from the quasi-Newton method proposed by Broyden, Fletcher, Goldfarb and Shanno independently in 1970 [24,25,26,27]. Similarly, applying this idea to the BB method, Dai et al. [28] solved the following equation

\min_{D = α I} ∥ τ (D^{- 1} s_{k - 1} - y_{k - 1}) + (1 - τ) (s_{k - 1} - D y_{k - 1}) ∥

(11)

to obtain the convex combination of

α_{k}^{B B 1}

and

α_{k}^{B B 2}

α_{k} = γ_{k} α_{k}^{B B 1} + (1 - γ_{k}) α_{k}^{B B 2},

(12)

where

γ_{k} \in [0, 1]

, and they further proved that the family of spectral gradient methods of (12) have R-superlinear convergence for two-dimensional strictly convex quadratics.

In addition to the several stepsize definitions mentioned above, there are also some BB-like stepsize. In [29], Dai et al. set

A = [\begin{matrix} 1 \\ λ \end{matrix}]

and

b = 0

, where

λ > 1

, they obtained a positive BB-like stepsize by averaging

α_{k}^{B B 1}

and

α_{k}^{B B 2}

geometrically as follows

α_{k} = \sqrt{α_{k}^{B B 1} \cdot α_{k}^{B B 2}},

(13)

whose simplification is equivalent to

α_{k} = \frac{∥ g_{k - 1} ∥}{∥ A g_{k - 1} ∥},

(14)

and they proved the R-superlinear convergence of the method. In addition, (14) can also be seen as a delay extension of the stepsize proposed by Dai and Yang in [30],

α_{k}^{D Y} = \frac{∥ g_{k} ∥}{∥ A g_{k} ∥} .

(15)

Interestingly, it has been shown in [30] that (15) will eventually approach the minimum value of

∥ I - α A ∥

, precisely

\underset{k \to \infty}{\lim \inf} α_{k}^{D Y} = \frac{2}{λ_{1} + λ_{n}},

(16)

where

λ_{1}

and

λ_{n}

are the minimum and maximum eigenvalues of A, and their corresponding eigenvectors are

\frac{g_{k}}{∥ g_{k} ∥} + \frac{g_{k + 1}}{∥ g_{k + 1} ∥}

and

\frac{g_{k}}{∥ g_{k} ∥} - \frac{g_{k + 1}}{∥ g_{k + 1} ∥}

, respectively. The minimum stepsize of (16) is the optimal stepsize in [31], i.e.,

α_{k}^{O P T} = \arg \min_{α \in R^{1}} ∥ I - α A ∥, α_{k}^{O P T} = \frac{2}{λ_{1} + λ_{n}} .

(17)

In this paper, we mainly research on the three-dimensional cases. That is to say, the coefficient matrix A of the quadratic term is a third-order matrix. In [29], their BB-like method applied only to

A = [\begin{matrix} 1 \\ λ \end{matrix}]

,

λ > 1

. Based on it, we modify the stepsize in (14) as follows:

α_{k}^{n e w} = \frac{∥ g_{k - 1} ∥}{∥\frac{A^{T} + A}{2} g_{k - 1}∥},

(18)

and make it applicable to both cases

A = [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}]

and

A = [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & 0 \\ 0 & 1 & λ \end{matrix}]

,

λ > 1

. For the case

A = [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}]

,

λ > 1

, we generalize it to a more general form which is

A = [\begin{matrix} μ \\ μ \\ φ \end{matrix}]

, where

φ > μ \geq 1

. For these two cases, we have carried out the proof of convergence and numerical experiments.

The paper is organized as follows. In Section 2, we analyze the new BB-like method which uses the stepsize

α_{k}^{n e w}

for the case of

A = [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}]

,

λ > 1

and

A = [\begin{matrix} μ \\ μ \\ φ \end{matrix}]

,

φ > μ \geq 1

, we prove the rate of convergence of the new method is R-superlinear. Additionally, we extend this case to n-dimension, which means that

A = diag {λ_{1}, λ_{2}, \dots, λ_{n}}

, where

1 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

. And we prove the rate of convergence in the n-dimensional case is R-linear. Section 3 provides the research of the case

A = [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & 0 \\ 0 & 1 & λ \end{matrix}]

,

λ > 1

, and we prove the global convergence of this case under some assumption. In Section 4, we give some numerical experiment results to show the effectiveness of the new method. Finally, the conclusions are given in Section 5.

2. The Case Where A Is a Diagonal Matrix and Its Convergence Analysis

2.1. Three-Dimensional Case

In this section, we start with the basic case that

A = [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}], b = 0,

where

λ > 1

. According to the iteration formula

x_{k + 1} = x_{k} - α_{k} g_{k}

, we assume that

x_{1}, x_{2}

are given initial iteration points and they satisfy

g_{1}^{(i)} \neq 0, g_{2}^{(i)} \neq 0, i = 1, 2, 3 .

(19)

As we know,

α_{k}^{n e w} = \frac{∥ g_{k - 1} ∥}{∥\frac{A^{T} + A}{2} g_{k - 1}∥},

so we have

\begin{matrix} ∥ g_{k - 1} ∥ = \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}, \\ ∥\frac{A^{T} + A}{2} g_{k - 1}∥ = ∥ A g_{k - 1} ∥ = \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(λ g_{k - 1}^{(3)})}^{2}}, \\ α_{k}^{n e w} = \frac{\sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(λ g_{k - 1}^{(3)})}^{2}}} . \end{matrix}

(20)

we set

p_{k} = \frac{{(g_{k}^{(1)})}^{2} + {(g_{k}^{(2)})}^{2}}{{(g_{k}^{(3)})}^{2}},

(21)

then

α_{k}^{n e w}

can be written as

α_{k}^{n e w} = \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} .

(22)

Notice that

g_{k} = A x_{k}

so the iteration formula for

g_{k}

is

g_{k + 1} = (I - α_{k}^{n e w} A) g_{k} .

(23)

According to (23), we have

\begin{matrix} (\begin{matrix} g_{k + 1}^{(1)} \\ g_{k + 1}^{(2)} \\ g_{k + 1}^{(3)} \end{matrix}) & = ([\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] - \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}]) (\begin{matrix} g_{k}^{(1)} \\ g_{k}^{(2)} \\ g_{k}^{(3)} \end{matrix}) \\ = [\begin{matrix} \frac{\sqrt{λ^{2} + p_{k - 1}} - \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} \\ \frac{\sqrt{λ^{2} + p_{k - 1}} - \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} \\ \frac{\sqrt{λ^{2} + p_{k - 1}} - λ \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} \end{matrix}] (\begin{matrix} g_{k}^{(1)} \\ g_{k}^{(2)} \\ g_{k}^{(3)} \end{matrix}), \end{matrix}

which is equivalent to

\{\begin{matrix} g_{k + 1}^{(1)} = (1 - \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}}) g_{k}^{(1)}; \\ g_{k + 1}^{(2)} = (1 - \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}}) g_{k}^{(2)}; \\ g_{k + 1}^{(3)} = (1 - \frac{λ \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}}) g_{k}^{(3)} . \end{matrix}

(24)

From (21) and (24), we can obtain

\begin{matrix} p_{k + 1} & = \frac{{(g_{k + 1}^{(1)})}^{2} + {(g_{k + 1}^{(2)})}^{2}}{{(g_{k + 1}^{(3)})}^{2}} \\ = \frac{{(1 - \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}})}^{2} [{(g_{k}^{(1)})}^{2} + {(g_{k}^{(2)})}^{2}]}{{(1 - \frac{λ \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}})}^{2} {(g_{k}^{(3)})}^{2}} \\ = \frac{{(1 - \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}})}^{2}}{{(1 - \frac{λ \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}})}^{2}} p_{k} \\ = {(\frac{\sqrt{λ^{2} + p_{k - 1}} - \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}} - λ \sqrt{1 + p_{k - 1}}})}^{2} p_{k} \\ = {(\frac{(\sqrt{λ^{2} + p_{k - 1}} - \sqrt{1 + p_{k - 1}}) (\sqrt{λ^{2} + p_{k - 1}} + λ \sqrt{1 + p_{k - 1}})}{(λ^{2} - 1) p_{k - 1}})}^{2} p_{k} \\ = {(\frac{λ - p_{k - 1} + \sqrt{τ (p_{k - 1})}}{λ + 1})}^{2} \frac{p_{k}}{p_{k - 1}^{2}}, \end{matrix}

(25)

where

τ

is a quadratic function

τ (ν) = (1 + ν) (λ^{2} + ν), ν > 0 .

(26)

Let

h (ν) = \frac{λ - ν + \sqrt{τ (ν)}}{λ + 1}, ν > 0,

(27)

then

p_{k + 1} = {(h (p_{k - 1}))}^{2} \frac{p_{k}}{p_{k - 1}^{2}} .

(28)

We define

W_{k} = \log p_{k}

, by (28) we have

W_{k + 1} = W_{k} - 2 W_{k - 1} + 2 \log h (p_{k - 1}) .

(29)

In order to prove the R-superlinear convergence of this case, we give three Lemma firstly.

Lemma 1.

Assume that

λ > 1

, the function

h (ν)

is monotonically increasing and

h (ν) \in (\frac{2 λ}{λ + 1}, \frac{λ + 1}{2})

when

ν \in (0, + \infty)

.

Proof.

The proof of Lemma 1 can fully refer to the proof of Lemma 1.2.1 in [29].

According to (26), we have

τ^{'} = (1 + ν) + (λ^{2} + ν),

and

{(τ^{'})}^{2} - 4 τ = {(λ^{2} - 1)}^{2} .

(30)

By direct calculation, we can obtain

\begin{matrix} h^{'} (ν) & = \frac{- 1 + \frac{1}{2} τ^{- \frac{1}{2}} τ^{'}}{λ + 1} \\ = \frac{{(τ^{'})}^{2} - 4 τ}{2 (λ + 1) τ^{\frac{1}{2}} (τ^{'} + 2 τ^{\frac{1}{2}})} \\ = \frac{{(λ^{2} - 1)}^{2}}{2 (λ + 1) (τ^{\frac{1}{2}} τ^{'} + 2 τ)} . \end{matrix}

(31)

Since

ν \geq 0

, we have

h^{'} (ν) > 0

. So

h (ν)

is monotonically increasing and when

ν = 0

, we have

h (0) = \frac{2 λ}{λ + 1} .

In addition to that, we have

\lim_{ν \to \infty} h (ν) = \frac{λ + 1}{2} .

So in summary, we obtain that when

ν \geq 0

,

h (ν) \in (\frac{2 λ}{λ + 1}, \frac{λ + 1}{2})

. □

In the next Lemma we will give the definition of

ζ_{k}

and its lower bound.

Lemma 2.

We define

ζ_{k} = W_{k} + (γ - 1) W_{k - 1}

, where γ satisfies

γ^{2} - γ + 2 = 0

, if

| ζ_{2} | > 8 \log \frac{λ + 1}{2},

(32)

then there exists

c_{1} > 0

, such that

| ζ_{k} | > (\sqrt{2} - 1) 2^{\frac{k}{2}} c_{1}, k \geq 2 .

(33)

Proof.

According to the definition of

ζ_{k}

and (29),

\begin{matrix} ζ_{k + 1} & = γ W_{k} + W_{k + 1} - W_{k} \\ = γ W_{k} - 2 W_{k - 1} + 2 \log h (p_{k - 1}) \\ = γ ζ_{k} + 2 \log h (p_{k - 1}) . \end{matrix}

Notice that

| γ | = \sqrt{2}

, and from Lemma 1 we have

\log h (p_{k - 1}) < \log \frac{λ + 1}{2}

, so

| ζ_{k + 1} | > \sqrt{2} | ζ_{k} | - c_{1},

where

c_{1} = 2 \log \frac{λ + 1}{2}

. By (32), we can obtain

\begin{matrix} | ζ_{k + 1} | & \geq 2^{\frac{k - 1}{2}} | ζ_{2} | - \frac{2^{\frac{k}{2}} - 1}{\sqrt{2} - 1} c_{1} \\ \geq (2^{\frac{k + 3}{2}} - \frac{2^{\frac{k}{2}} - 1}{\sqrt{2} - 1}) c_{1} \\ = [(\sqrt{2} - 1) (2^{\frac{k}{2}} + 1) + 2] c_{1} \\ > (\sqrt{2} - 1) 2^{\frac{k}{2}} c_{1}, \end{matrix}

so we finish the proof. □

By the Lemma 2 we know

γ = \frac{1 \pm \sqrt{7} i}{2}

and

| γ - 1 | = \sqrt{2}

, so

| ζ_{k} | \leq | W_{k} | + \sqrt{2} | W_{k - 1} | \leq (\sqrt{2} + 1) \max {| W_{k} |, | W_{k - 1} |},

combining it with (33) we can obtain

\max {| W_{k} |, | W_{k - 1} |} \geq \frac{1}{\sqrt{2} + 1} (\sqrt{2} - 1) 2^{\frac{k}{2}} c_{1} = {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} .

(34)

Lemma 3.

Under the conditions of Lemma 2, we have

\max_{- 1 \leq i \leq 3} W_{k + i} \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} - 4 \log \frac{λ + 1}{2}, k \geq 2;

(35)

\min_{- 1 \leq i \leq 3} W_{k + i} \leq - {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} + 4 \log \frac{λ + 1}{2}, k \geq 2 .

(36)

Proof.

By (34), if

W_{k - 1} \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1}

or

W_{k} \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1}

holds, thus (35) holds. Now, we consider other cases. We assume that the above two inequalities are not true, so we have

W_{k - 1} \leq - {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1}

or

W_{k} \leq - {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1},

and by (29) we know that

\begin{matrix} W_{k + 2} & = W_{k + 1} - 2 W_{k} + 2 \log h (p_{k}) \\ = - W_{k} - 2 W_{k - 1} + 2 \log h (p_{k - 1}) + 2 \log h (p_{k}) . \end{matrix}

(37)

Next, we prove (35) in two cases.

Case (i). When

W_{k - 1} \leq - {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1}

, if

W_{k} < 0

, we can obtain from (37)

W_{k + 2} \geq - 2 W_{k - 1} + 2 \log h (p_{k - 1}) \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} - 2 c_{1};

and if

W_{k} > 0

, by (29) we have

W_{k + 1} \geq - 2 W_{k - 1} + 2 \log h (p_{k - 1}) \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} - 2 c_{1} .

Case (ii). When

W_{k} \leq - {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1}

, if

W_{k + 1} < 0

, we can obtain from (37)

W_{k + 3} \geq - 2 W_{k} + 2 \log h (p_{k}) \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} - 2 c_{1};

and if

W_{k + 1} > 0

, by (29) we have

W_{k + 2} \geq - 2 W_{k} + 2 \log h (p_{k}) \geq {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} - 2 c_{1} .

From the proof of Lemma 2, we know that

c_{1} = 2 \log \frac{λ + 1}{2}

. Therefore, according to the above analysis, we can prove that (35) is true and the proof of (36) is similar to it. □

In the following theorem, we will prove the rate of convergence of this case is R-superlinear.

Theorem 1.

Assume that (19) and (21) hold, the sequence of gradient norms

{∥ g_{k} ∥}

converges to zero and it converges R-superlinearly.

Proof.

Notice that

α_{k}^{n e w} \in (\frac{1}{λ}, 1)

and

g_{k + 1} = (I - α_{k}^{n e w} A) g_{k}

, so we can obtain

| g_{k + 1}^{(i)} | \leq (λ - 1) | g_{k}^{(i)} |,

(38)

where

i = 1, 2, 3

and

k \geq 0

.

Firstly, we consider the third component of the gradient. From (24) we know

\begin{matrix} | g_{k + 1}^{(3)} | & \leq | \frac{\sqrt{λ^{2} + p_{k - 1}} - λ \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} | | g_{k}^{(3)} | \\ \leq \frac{(λ^{2} - 1) p_{k - 1}}{\sqrt{λ^{2} + p_{k - 1}} (\sqrt{λ^{2} + p_{k - 1}} + λ \sqrt{1 + p_{k - 1}})} | g_{k}^{(3)} | \\ \leq \frac{(λ^{2} - 1) p_{k - 1}}{2 λ^{2}} | g_{k}^{(3)} | \\ < (λ - 1) p_{k - 1} | g_{k}^{(3)} | . \end{matrix}

(39)

Combining (38) and (39), we obtain

| g_{k + 5}^{(3)} {| \leq (λ - 1)}^{5} (\min_{- 1 \leq j \leq 3} p_{k + j}) | g_{k}^{(3)} | .

(40)

Since

W_{k} = \log p_{k}

, so by using (36) we have

| g_{(k + 5)}^{(3)} {| \leq (λ - 1)}^{5} \exp (- {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} + 4 \log \frac{λ + 1}{2}) | g_{k}^{(3)} | .

(41)

Similarly, as for the first component of the gradient, we calculate directly,

\begin{matrix} | g_{k + 1}^{(1)} | & \leq | \frac{\sqrt{λ^{2} + p_{k - 1}} - \sqrt{1 + p_{k - 1}}}{\sqrt{λ^{2} + p_{k - 1}}} | | g_{k}^{(1)} | \\ \leq \frac{λ^{2} - 1}{\sqrt{λ^{2} + p_{k - 1}} (\sqrt{λ^{2} + p_{k - 1}} + \sqrt{1 + p_{k - 1}})} | g_{k}^{(1)} | \\ \leq \frac{λ^{2} - 1}{\sqrt{p_{k - 1}} [\sqrt{p_{k - 1}} (\sqrt{\frac{λ^{2}}{p_{k - 1}} + 1} + \sqrt{\frac{1}{p_{k - 1}} + 1})]} | g_{k}^{(1)} | \\ \leq \frac{λ^{2} - 1}{2 p_{k - 1}} | g_{k}^{(1)} | . \end{matrix}

By (38) and

W_{k} = \log p_{k}

, we can obtain

\begin{matrix} | g_{k + 5}^{(1)} | & \leq \frac{1}{2} (λ + 1) {(λ - 1)}^{5} (\max_{- 1 \leq j \leq 3} \frac{1}{p_{k + j}}) | g_{k}^{(1)} | \\ \leq \frac{1}{2} (λ + 1) {(λ - 1)}^{5} \exp (- {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} + 4 \log \frac{λ + 1}{2}) | g_{k}^{(1)} | . \end{matrix}

(42)

And from (24) we know that the condition of the second component of the gradient is the same as the first component, so

| g_{k + 5}^{(2)} | \leq \frac{1}{2} (λ + 1) {(λ - 1)}^{5} \exp (- {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} + 4 \log \frac{λ + 1}{2}) | g_{k}^{(2)} |

(43)

can be obtained.

Finally, by (41)–(43), for any k,

∥ g_{k + 5} ∥ \leq \frac{1}{2} (λ + 1) {(λ - 1)}^{5} \exp (- {(\sqrt{2} - 1)}^{2} 2^{\frac{k}{2}} c_{1} + 4 \log \frac{λ + 1}{2}) ∥ g_{k} ∥,

(44)

so the sequence

{∥ g_{k} ∥}

converges to zero R-superlinearly. □

On the basis of the above conclusion we generalize this case to a more general form, we set

A = [\begin{matrix} μ \\ μ \\ φ \end{matrix}]

,

φ > μ \geq 1

. According to the assumptions and conditions mentioned above, we have

α_{k}^{n e w} = \frac{\sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(μ g_{k - 1}^{(1)})}^{2} + {(μ g_{k - 1}^{(2)})}^{2} + {(φ g_{k - 1}^{(3)})}^{2}}} .

Substituting

p_{k}

of (21) into the above equation, then

α_{k}^{n e w} = \frac{\sqrt{1 + p_{k - 1}}}{\sqrt{φ^{2} + μ^{2} p_{k - 1}}} .

According to (23), we have

\{\begin{matrix} g_{k + 1}^{(1)} = (1 - \frac{μ \sqrt{1 + p_{k - 1}}}{\sqrt{φ^{2} + μ^{2} p_{k - 1}}}) g_{k}^{(1)}; \\ g_{k + 1}^{(2)} = (1 - \frac{μ \sqrt{1 + p_{k - 1}}}{\sqrt{φ^{2} + μ^{2} p_{k - 1}}}) g_{k}^{(2)}; \\ g_{k + 1}^{(3)} = (1 - \frac{φ \sqrt{1 + p_{k - 1}}}{\sqrt{φ^{2} + μ^{2} p_{k - 1}}}) g_{k}^{(3)} . \end{matrix}

Using the calculation method of (25), we can obtain

p_{k + 1} = {(\frac{φ - μ p_{k - 1} + \sqrt{η (p_{k - 1})}}{φ + μ})}^{2} \frac{p_{k}}{p_{k - 1}^{2}},

where

η

is a quadratic function

η (ω) = (1 + ω) (φ^{2} + μ^{2} ω), ω > 0 .

Let

θ (ω) = \frac{φ - μ ω + \sqrt{η (ω)}}{φ + μ}, ω > 0,

then

p_{k + 1} = {(θ (p_{k - 1}))}^{2} \frac{p_{k}}{p_{k - 1}^{2}} .

So we obtain something similar to (28), and by using the same proof method as Lemmas 1–3 and Theorem 1, we can also prove that when

A = [\begin{matrix} μ \\ μ \\ φ \end{matrix}]

,

φ > μ \geq 1

, the BB-like method using the new stepsize is convergent and the rate of convergence is R-superlinear.

2.2. n-Dimensional Case

In this case, we consider that

A = diag {λ_{1}, λ_{2}, \dots, λ_{n}}, b = 0,

where

1 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

. Then, we will prove R-linear convergence of the new method for n-dimensional case.

In [16], Dai has proved that if

A = diag {λ_{1}, λ_{2}, \dots, λ_{n}}

, where

1 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

, and the stepsize

α_{k}

has the following Property 1, then either

g_{k} = 0

for some finite k or the sequence of gradient norms

{∥ g_{k} ∥}

converges to zero R-linearly.

Firstly, we give the Property 1 and the Theorem in [16]. In the Property 1, they define

g_{k}^{(i)}

is the ith component of

g_{k}

and

G (k, l) = \sum_{i = 1}^{l} {(g_{k}^{(i)})}^{2} .

Property 1

([16]). Suppose that there exist an integer m and positive constants

M_{1} > λ_{1}

and

M_{2}

such that

(i)

λ_{1} \leq α_{k}^{- 1} \leq M_{1}

;

(ii) for any integer

l \in [1, n - 1]

and

ϵ > 0

, if

G (k - j, l) \leq ϵ

and

{(g_{k - j}^{(l + 1)})}^{2} \geq M_{2} ϵ

hold for

j \in [0, \min {k, m} - 1]

, then

α_{k}^{- 1} \geq \frac{2}{3} λ_{l + 1}

.

Theorem 2

([16]). Consider the linear system

A x = b, A \in R^{n \times n}, b \in R^{n},

where

A = d i a g (λ_{1}, λ_{2}, \dots, λ_{n})

, and

1 = λ_{1} \leq λ_{2} \dots \leq λ_{n}

. Consider the gradient method where the stepsize

α_{k}

has Property 1. Then either

g_{k} = 0

for some finite k, or the sequence

{∥ g_{k} ∥}

converges to zero R-linearly.

Proof.

The proof can fully refer to the proof of Theorem 4.1 in [16].

By (23), we have

g_{k + 1}^{(i)} = (1 - α_{k} λ_{i}) g_{k}^{(i)} .

(45)

Denote

δ_{1} = \max {{(1 - (λ_{1} / M_{1}))}^{2}, 1 / 4} \in (0, 1)

and

δ_{2} = \max {{(1 - (M_{1} / λ_{1}))}^{2}, 2}

. Then by (45) and the definition of

G (k, l)

, we can obtain that for all

k \geq 1

,

G (k + 1, 1) \leq δ_{1} G (k, l),

(46)

{(g_{k + 1}^{(i)})}^{2} \leq δ_{2} {(g_{k}^{(i)})}^{2}, for i = 1, 2, \dots, n,

(47)

∥ g_{k + 1} ∥^{2} \leq δ_{2} {∥ g_{k} ∥}^{2} .

(48)

The rest of the proof will be divided into three parts as follows:

(I). We prove that, for any integer

1 \leq l < n

and

k \geq 1

, if there exist some

ϵ_{l} \in (0, M_{2}^{- 1})

and integer

m_{l}

such that

G (k + j, l) \leq ϵ_{l} {∥ g_{k} ∥}^{2}, for all j \geq m_{l},

(49)

then we must have

{(g_{k + j_{0}}^{(l + 1)})}^{2} \leq M_{2} ϵ_{l} {∥ g_{k} ∥}^{2}, for some j_{0} \in [m_{l}, m_{l} + m + Δ_{l} + 1],

(50)

where

Δ_{l} = ⌈\frac{\log (M_{2} ϵ_{l} δ_{2}^{- (m_{l} + m)})}{\log δ_{1}}⌉ .

In fact, suppose that

{(g_{k + j}^{(l + 1)})}^{2} > M_{2} ϵ_{l} {∥ g_{k} ∥}^{2}, for j \in [m_{l}, m_{l} + m + Δ_{l}] .

(51)

Then we have from (49), (51) and Property 1 (ii) that

α_{k + j}^{(- 1)} \geq \frac{2}{3} λ_{l + 1}, for j \in [m_{l} + m, m_{l} + m + Δ_{l}] .

(52)

By (45), (52) and Property 1 (i), we can obtain

{(g_{k + j + 1}^{(l + 1)})}^{2} \leq δ_{1} {(g_{k + j}^{(l + 1)})}^{2}, for j \in [m_{l} + m, m_{l} + m + Δ_{l}] .

(53)

And from (47), (53) and the definition of

Δ_{l}

, we obtain that

{(g_{k + m_{l} + m + Δ_{l} + 1}^{(l + 1)})}^{2} \leq δ_{1}^{Δ_{l} + 1} {(g_{k + m_{l} + m}^{(l + 1)})}^{2} \leq δ_{1}^{Δ_{l} + 1} δ_{2}^{m_{l} + m} {(g_{k}^{(l + 1)})}^{2} \leq M_{2} ϵ_{l} {∥ g_{k} ∥}^{2} .

So (50) must hold.

(II). Denoting

m_{l + 1} = m_{l} + m + Δ_{l} + 1

and

ϵ_{l + 1} = (1 + M_{2} δ_{2}^{m}) ϵ_{l}

, we prove that if (49) holds, we can further have

G (k + j, l + 1) \leq ϵ_{l + 1} {∥ g_{k} ∥}^{2}, for all j \geq m_{l + 1} .

(54)

In fact, by (I), we know that there are infinitely many integers

j_{1}

and

j_{2}

with

j_{2} > j_{1} \geq j_{0}

such that

{(g_{k + j}^{(l + 1)})}^{2} \leq M_{2} ϵ_{l} {∥ g_{k} ∥}^{2}, for j = j_{1}, j_{2},

(55)

and

{(g_{k + j}^{(l + 1)})}^{2} > M_{2} ϵ_{l} {∥ g_{k} ∥}^{2}, for j \in [j_{1} + 1, j_{2} - 1] .

(56)

Then we have from (47) and (55) that

{(g_{k + j}^{(l + 1)})}^{2} \leq δ_{2}^{m} {(g_{k + j_{1}}^{(l + 1)})}^{2} \leq M_{2} δ_{2}^{m} ϵ_{l} {∥ g_{k} ∥}^{2}, for j \in [j_{1} + 1, j_{1} + m] .

(57)

If

j_{2} > j_{1} + m

, by Property (A), (45), (49) and (56), we have

α_{k + j}^{- 1} \geq \frac{2}{3} λ_{l + 1} and {(g_{k + j + 1}^{(l + 1)})}^{2} \leq δ_{1} {(g_{k + j}^{(l + 1)})}^{2}, for j \in [j_{1} + m, j_{2} - 1] .

(58)

It follows from (48), (55) and (58) that

{(g_{k + j}^{(l + 1)})}^{2} < {(g_{k + j_{1} + m}^{(l + 1)})}^{2} \leq M_{2} δ_{2}^{m} ϵ_{l} {∥ g_{k} ∥}^{2}, for j \in [j_{1} + m + 1, j_{2}] .

(59)

Due to the arbitrariness of

j_{1}

and

j_{2}

, (57) and (59), we know that the following inequality holds for any

j \geq j_{0}

:

{(g_{k + j}^{(l + 1)})}^{2} \leq M_{2} δ_{2}^{m} ϵ_{l} {∥ g_{k} ∥}^{2} .

(60)

Since

j_{0} \leq m_{l + 1}

, then can obtain from (49), (60) and the definition of

G (k, l)

that (54) holds.

(III). Denoting for any

1 \leq l \leq n

,

ϵ_{l} = \frac{1}{4} {(1 + M_{2} δ_{2}^{m})}^{(l - n)},

(61)

and setting

m_{1} = ⌈ \log ϵ_{1} / \log δ_{1} ⌉

,

m_{l + 1} = m_{l} + m + Δ_{l} + 1

for

l = 1, \dots, n - 1

and

M = m_{n}

, we prove by induction that for all

1 \leq l \leq n

,

G (k + j, l) \leq ϵ_{l} {∥ g_{k} ∥}^{2}, for all j \geq m_{l} .

(62)

In fact, by (46) and the definition of

m_{1}

, (62) holds clearly holds for

l = 1

. Suppose that (62) is true for some

1 \leq l \leq n - 1

. Then by (II), we know that (62) holds for

l + 1

. Thus by induction, we know that (62) holds for all

1 \leq l \leq n

. Notice that

ϵ_{n} = 1 / 4

and

G (k, n) = ∥ g_{k} ∥^{2}

. It follows from (62) that

∥ g_{k + M} ∥^{2} \leq \frac{1}{4} {∥ g_{k} ∥}^{2} .

(63)

Since

M = m_{n}

depends only on

λ_{1}

,

M_{1}

and

M_{2}

, then we can obtain by (48) and (63) that the sequence

{∥ g_{k} ∥}

converges to zero R-linearly. □

According to the Property and the Theorem given above, we will prove that the stepsize

α_{k}^{n e w}

satisfies Property 1 and the n-dimensional case has R-linear convergence rate in the following Theorem.

Theorem 3.

If

A = diag {λ_{1}, λ_{2}, \dots, λ_{n}}

, where

1 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

, then either

g_{k} = 0

for some finite k or the sequence of gradient norms

{∥ g_{k} ∥}

converges to zero R-linearly.

Proof.

Firstly, we let

M_{1} = λ_{n}

and

M_{2} = 2

. When A is a symmetric positive matrix, we can obtain from (6) and (8) that

\begin{matrix} α_{k}^{B B 1} = \frac{g_{k - 1}^{T} g_{k - 1}}{g_{k - 1}^{T} A g_{k - 1}}, \\ α_{k}^{B B 2} = \frac{g_{k - 1}^{T} A g_{k - 1}}{g_{k - 1}^{T} A^{2} g_{k - 1}} . \end{matrix}

And from (18), we have

α_{k}^{n e w} = \frac{∥ g_{k - 1} ∥}{∥\frac{A^{T} + A}{2} g_{k - 1}∥} = \frac{∥ g_{k - 1} ∥}{∥ A g_{k - 1} ∥} = \sqrt{α_{k}^{B B 1} \cdot α_{k}^{B B 2}} .

So, the following formula holds

α_{k}^{B B 2} \leq α_{k}^{n e w} \leq α_{k}^{B B 1} .

Then,

{(α_{k}^{n e w})}^{- 1} \geq \frac{1}{α_{k}^{B B 1}} = \frac{\sum_{i = 1}^{n} λ_{i} {(g_{k - 1}^{(i)})}^{2}}{\sum_{i = 1}^{n} {(g_{k - 1}^{(i)})}^{2}} \geq λ_{1} .

Similarly,

{(α_{k}^{n e w})}^{- 1} \leq \frac{1}{α_{k}^{B B 2}} = \frac{\sum_{i = 1}^{n} λ_{i}^{2} {(g_{k - 1}^{(i)})}^{2}}{\sum_{i = 1}^{n} λ_{i} {(g_{k - 1}^{(i)})}^{2}} \leq λ_{n} .

So, (i) of Property 1 holds.

If

G (k - j, l) \leq ϵ

and

{(g_{k - j}^{(l + 1)})}^{2} \geq M_{2} ϵ

hold for any integer

l \in [1, n - 1]

,

ϵ > 0

and

j \in [0, \min {k, m} - 1]

, we have

\begin{matrix} {(α_{k}^{n e w})}^{- 1} & \geq \frac{1}{α_{k}^{B B 1}} = \frac{\sum_{i = 1}^{n} λ_{i} {(g_{k - 1}^{(i)})}^{2}}{\sum_{i = 1}^{n} {(g_{k - 1}^{(i)})}^{2}} \\ \geq \frac{λ_{l + 1} \sum_{i = l + 1}^{n} {(g_{k - 1}^{(i)})}^{2}}{ϵ + \sum_{i = l + 1}^{n} {(g_{k - 1}^{(i)})}^{2}} \\ \geq \frac{M_{2}}{M_{2} + 1} λ_{l + 1} \\ = \frac{2}{3} λ_{l + 1} . \end{matrix}

For the second inequality, we define

X = \sum_{i = l + 1}^{n} {(g_{k - 1}^{(i)})}^{2}

and

F (X) = \frac{X}{ϵ + X}

. Obviously, the function

F (X)

is monotonically increasing when

X > 0

. According to the assumption, we have

\sum_{i = l + 1}^{n} {(g_{k - 1}^{(i)})}^{2} \geq M_{2} ϵ

which means that

X \geq M_{2} ϵ > 0

. So we can obtain

F (X) \geq \frac{M_{2} ϵ}{ϵ + M_{2} ϵ} = \frac{M_{2}}{M_{2} + 1},

the third inequality holds.

Thus, (ii) of Property 1 also holds.

Above all, the conclusion of the Theorem holds, we finish the proof. □

3. The Case Where A Is an Asymmetric Matrix and Its Convergence Analysis

In this case, we consider that

A = [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & 0 \\ 0 & 1 & λ \end{matrix}], b = 0,

where

λ > 1

. Clearly, in this case, A has a double characteristic root and it is not a symmetric matrix, so the analysis of this case will be different from that in Section 2.

Firstly, we give two initial iteration points

x_{1}, x_{2}

, which satisfy

g_{1}^{(i)} \neq 0, g_{2}^{(i)} \neq 0, i = 1, 2, 3 .

In this case,

∥ g_{k - 1} ∥ = \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}},

\begin{matrix} ∥\frac{A + A^{T}}{2} g_{k - 1}∥ & = \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(λ g_{k - 1}^{(2)} + \frac{1}{2} g_{k - 1}^{(3)})}^{2} + {(\frac{1}{2} g_{k - 1}^{(2)} + λ g_{k - 1}^{(3)})}^{2}} \\ = \sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}, \end{matrix}

so the stepsize will be

α_{k}^{n e w} = \frac{\sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}} .

(64)

And by

g_{k} = \frac{A + A^{T}}{2} x_{k}

, we have

g_{k + 1} = (I - α_{k}^{n e w} \frac{A + A^{T}}{2}) g_{k} .

(65)

That is to say

(\begin{matrix} g_{k + 1}^{(1)} \\ g_{k + 1}^{(2)} \\ g_{k + 1}^{(3)} \end{matrix}) = ([\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}] - α_{k}^{n e w} [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & \frac{1}{2} \\ 0 & \frac{1}{2} & λ \end{matrix}]) (\begin{matrix} g_{k}^{(1)} \\ g_{k}^{(2)} \\ g_{k}^{(3)} \end{matrix}),

we set

M = 1 + \frac{- λ \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}};

N = \frac{- \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}},

so we can obtain

\{\begin{matrix} g_{k + 1}^{(1)} = (1 + N) g_{k}^{(1)}; \\ g_{k + 1}^{(2)} = M g_{k}^{(2)} + \frac{1}{2} N g_{k}^{(3)}; \\ g_{k + 1}^{(3)} = \frac{1}{2} N g_{k}^{(2)} + M g_{k}^{(3)} . \end{matrix}

(66)

From (66), we have

\begin{matrix} ∥ g_{k + 1} ∥^{2} & = {(g_{k + 1}^{(1)})}^{2} + {(g_{k + 1}^{(2)})}^{2} + {(g_{k + 1}^{(3)})}^{2} \\ = {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + {(M g_{k}^{(2)} + \frac{1}{2} N g_{k}^{(3)})}^{2} + {(\frac{1}{2} N g_{k}^{(2)} + M g_{k}^{(3)})}^{2} \\ = {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + (M^{2} + \frac{1}{4} N^{2}) [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] + 2 M N g_{k}^{(2)} g_{k}^{(3)} . \end{matrix}

(67)

As we know

M = 1 + λ N

and

N < 0

, but we cannot be sure whether M is positive or negative. So in order to prove the global convergence, we consider two cases

M > 0

and

M < 0

.

Theorem 4.

When

M > 0

, the sequence of gradient norms

{∥ g_{k} ∥}

converges to zero.

Proof.

At first, we assume that

g_{k}^{(1)} \neq 0, g_{k}^{(2)} \neq 0, g_{k}^{(3)} \neq 0

for all

k \geq 1

. Since

M = 1 + λ N > 0

, so

1 < λ < - \frac{1}{N}

and

- 1 < N < 0

. Next, as for the product term in (67) we discuss it in two cases

g_{k}^{(2)} g_{k}^{(3)} > 0

and

g_{k}^{(2)} g_{k}^{(3)} < 0

.

Case (i). When

g_{k}^{(2)} g_{k}^{(3)} > 0

, we have

2 M N g_{k}^{(2)} g_{k}^{(3)} < 0

. By (67),

\begin{matrix} ∥ g_{k + 1} ∥^{2} & < {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + (M^{2} + \frac{1}{4} N^{2}) [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] - M N [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] \\ = {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + {(M - \frac{1}{2} N)}^{2} [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] . \end{matrix}

(68)

If

- \frac{2}{3} < N < 0

i.e.,

\frac{3}{2} < λ < - \frac{1}{N}

,

∥ g_{k + 1} ∥^{2} < {(1 + N)}^{2} {∥ g_{k} ∥}^{2},

where

{(1 + N)}^{2} < 1

, so

{∥ g_{k} ∥}

converges to zero.

And if

- 1 < N < - \frac{2}{3}

i.e.,

1 < λ < \frac{3}{2}

,

∥ g_{k + 1} ∥^{2} < {(M - \frac{1}{2} N)}^{2} ∥ g_{k} ∥^{2} = {[1 + (λ - \frac{1}{2}) N]}^{2} {∥ g_{k} ∥}^{2},

where

\begin{matrix} 1 + (λ - \frac{1}{2}) N & = 1 - \frac{(λ - \frac{1}{2}) \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}} \\ < 1 - \frac{\sqrt{{(λ - \frac{1}{2})}^{2} [{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + λ [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}} \\ = 1 - \frac{\sqrt{{(λ - \frac{1}{2})}^{2} [{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + {(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}} \\ < 1 - \frac{\sqrt{{(λ - \frac{1}{2})}^{2} [{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}}{\sqrt{{(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}} \\ = 1 - \frac{λ - \frac{1}{2}}{λ + \frac{1}{2}} \\ = \frac{1}{λ + \frac{1}{2}}, \end{matrix}

so

{[1 + (λ - \frac{1}{2}) N]}^{2} < 1

,

{∥ g_{k} ∥}

converges to zero.

Case (ii). When

g_{k}^{(2)} g_{k}^{(3)} < 0

, we have

2 M N g_{k}^{(2)} g_{k}^{(3)} > 0

, and

- 2 g_{k}^{(2)} g_{k}^{(3)} < {(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}

, then

2 M N g_{k}^{(2)} g_{k}^{(3)} < - M N [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}]

hold. By (67),

\begin{matrix} ∥ g_{k + 1} ∥^{2} & < {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + (M^{2} + \frac{1}{4} N^{2}) [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] - M N [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] \\ = {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + {(M - \frac{1}{2} N)}^{2} [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] . \end{matrix}

(69)

Since (69) is the same as (68), so the proof of Case (ii) is similar to Case (i), and

{∥ g_{k} ∥}

converges to zero, too.

According to the above analysis, we finish the proof. □

Before we prove the case of

M < 0

, we set

Q = \frac{{(g_{k - 1}^{(1)})}^{2}}{{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}, k \geq 2

and then we will give the following theorem.

Theorem 5.

When

M < 0

, if

- 1 < N < - \frac{2}{λ + \frac{3}{2}}

, we assume

Q < 3

, the sequence of gradient norms

{∥ g_{k} ∥}

converges to zero; if

- \frac{2}{λ + \frac{3}{2}} < N < 0

, the sequence

{∥ g_{k} ∥}

also converges to zero.

Proof.

Firstly, for any

k \geq 1

we assume that

g_{k}^{(i)} \neq 0, i = 1, 2, 3

. By (67), whether

g_{k}^{(2)} g_{k}^{(3)}

is positive or negative, we have

\begin{matrix} ∥ g_{k + 1} ∥^{2} & < {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + (M^{2} + \frac{1}{4} N^{2}) [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] + M N [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] \\ = {(1 + N)}^{2} {(g_{k}^{(1)})}^{2} + {(M + \frac{1}{2} N)}^{2} [{(g_{k}^{(2)})}^{2} + {(g_{k}^{(3)})}^{2}] . \end{matrix}

(70)

If

- 1 < N < - \frac{2}{λ + \frac{3}{2}}

, we can see that

{(M + \frac{1}{2} N)}^{2} = {[1 + (λ + \frac{1}{2}) N]}^{2} > {(1 + N)}^{2}

, so

∥ g_{k + 1} ∥^{2} < {[1 + (λ + \frac{1}{2}) N]}^{2} {∥ g_{k} ∥}^{2}

. As we know,

\begin{matrix} 1 + (λ + \frac{1}{2}) N & = 1 - \frac{(λ + \frac{1}{2}) \sqrt{{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}}}{\sqrt{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}} \\ = 1 - \sqrt{\frac{{(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + 2 λ g_{k - 1}^{(2)} g_{k - 1}^{(3)}}} \\ < 1 - \sqrt{\frac{{(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(1)})}^{2} + {(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}{{(g_{k - 1}^{(1)})}^{2} + (λ^{2} + \frac{1}{4}) [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}] + λ [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}} \\ = 1 - \sqrt{1 + \frac{[{(λ + \frac{1}{2})}^{2} - 1] {(g_{k - 1}^{(1)})}^{2}}{{(g_{k - 1}^{(1)})}^{2} + {(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}} . \end{matrix}

Let

P = \frac{[{(λ + \frac{1}{2})}^{2} - 1] {(g_{k - 1}^{(1)})}^{2}}{{(g_{k - 1}^{(1)})}^{2} + {(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]}

, then we have

\begin{matrix} P & < \frac{{(λ + \frac{1}{2})}^{2} {(g_{k - 1}^{(1)})}^{2}}{{(g_{k - 1}^{(1)})}^{2} + {(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]} \\ < \frac{{(λ + \frac{1}{2})}^{2} {(g_{k - 1}^{(1)})}^{2}}{{(λ + \frac{1}{2})}^{2} [{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}]} \\ = \frac{{(g_{k - 1}^{(1)})}^{2}}{{(g_{k - 1}^{(2)})}^{2} + {(g_{k - 1}^{(3)})}^{2}} . \end{matrix}

From the assumption

Q < 3

, we can obtain

P < 3

and

{[1 + (λ + \frac{1}{2}) N]}^{2} < 1

, so the sequence

{∥ g_{k} ∥}

converges to zero.

And if

- \frac{2}{λ + \frac{3}{2}} < N < 0

, it follows that

{(1 + N)}^{2} > {[1 + (λ + \frac{1}{2}) N]}^{2} = {(M + \frac{1}{2} N)}^{2}

and then

∥ g_{k + 1} ∥^{2} < {(1 + N)}^{2} {∥ g_{k} ∥}^{2}

. Since

{(1 + N)}^{2} < 1

so it is obvious that

{∥ g_{k} ∥}

converges to zero.

Above all, we finish the proof. □

From the above two theorems, we can see that when

A = [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & 0 \\ 0 & 1 & λ \end{matrix}]

,

λ > 1

, the new method is globally convergent. In order to prove the convergence of the new method, we add the assumption condition

Q < 3

, but the value of Q does not need to be considered in the actual calculation, and it does not affect the computational efficiency of the new method.

4. Numerical Results

In this section, we present the results of some numerical experiments on how the new BB-like method using the new stepsize

α_{k}^{n e w}

compares with other BB methods in solving optimization problems. The main difference between the different methods we compare here is the choice of the stepsize. We finally choose the following stepsizes for comparison:

α_{k}^{B B 1}

[4],

α_{k}^{B B 2}

[4],

α_{k}^{D Y}

[30],

α_{k}^{M G}

[32] and the stepsize

α_{k}

in [28]. From (12) we can see the stepsize in [28] is a convex combination, so in our experiments we set

γ_{k} = 0.5

and use it to represent this method. In addition to this, for the case when A is an n-dimensional symmetric matrix, we compare our BB-like method with the ABBmin1 and ABBmin2 methods in [18], and ABB method in [17]. The calculation results of all methods were completed by Python (v3.9.13). All the runs were carried out on a PC with an Intel Core i5, 2.3 GHz processor and 8 GB of RAM. For the examples we wanted to solve in the numerical experiments, we chose the following termination condition:

| f (x_{k + 1}) - f (x_{k}) | \leq ϵ,

for some given

ϵ > 0

, so that we can obtain the expected results.

In our numerical experiments, we mainly considered five types of optimization problems. And now we give the five examples in specific forms as follows.

Example 1.

Consider the following optimization problem,

\min_{x \in R^{3}} f (x) = \frac{1}{2} x^{T} [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}] x,

(71)

where

λ > 1

, initial point

x_{0} = {(10, 7, 1)}^{T}

,

ϵ = 10^{- 6}

.

For Example 1, we compared the number of iterations and the minimum points of the new method with the other five methods in solving optimization problems when

λ

changes. The specific results are shown in Table 1. Moreover, we give a comparison of the CPU time of different methods when solving Example 1 in Figure 1.

Table 1. Number of iterations and minimum points of compared methods for Example 1.

Figure 1. Comparison of six methods on CPU time for Example 1.

Example 2.

Consider the following optimization problem,

\min_{x \in R^{3}} f (x) = \frac{1}{2} x^{T} [\begin{matrix} a \\ a \\ b \end{matrix}] x,

(72)

where

0 < a < b

, initial point

x_{0} = {(9, 6, 2)}^{T}

,

ϵ = 10^{- 6}

.

For Example 2, we give the comparison results of the number of iterations and the minimum points of each method when solving the optimization problems with different values of a and b in Table 2. And we give a comparison of the CPU time of different methods when solving Example 2 in Figure 2.

Table 2. Number of iterations and minimum points of compared methods for Example 2.

Figure 2. Comparison of six methods on CPU time for Example 2.

Example 3.

Consider the following optimization problem,

\min_{x \in R^{3}} f (x) = \frac{1}{2} x^{T} [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & 0 \\ 0 & 1 & λ \end{matrix}] x,

(73)

where

λ > 1

,

ϵ = 10^{- 8}

and the initial point

x_{0} \in R^{3}

can be chosen at random.

For Example 3, due to the particularity of its form, A is not a symmetric matrix but other forms of BB methods require A to be a symmetric positive definite matrix. Therefore, we only give the results of the number of iterations and the minimum points of this kind of optimization problems by using the new method when the initial points change in Table 3.

Table 3. Number of iterations and minimum points of our method for Example 3.

Example 4.

Consider the following optimization problem,

\min_{x \in R^{n}} f (x) = \frac{1}{2} x^{T} [\begin{matrix} λ_{1} \\ λ_{2} \\ ⋱ \\ λ_{n} \end{matrix}] x,

(74)

where

n = 100

,

1 \leq λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

,

ϵ = 10^{- 8}

and the initial point

x_{0} \in R^{n}

can be chosen at random.

For Example 4, we chose two other methods, ABBmin1 and ABBmin2 methods, to compare with our method. The parameters of ABBmin1 and ABBmin2 methods were selected as in [18], which were

τ = 0.8

,

m = 9

and

τ = 0.9

, respectively. The initial points we chose were

(- 10, 5, 2)

,

(- 9, - 7, - 1)

and

(7, 3, 5)

. For each initial point, we randomly chose ten different sets of values of

λ_{i}

,

i = 1, \dots, 100

, which satisfied

λ_{1} = 1

,

λ_{100} =

10,000, and

λ_{j}

was evenly distributed between 1 and 10,000 for

j = 2, \dots, 99

. Figure 3 and Figure 4, respectively, show the results of the comparison of the number of iterations and the CPU time when the three methods solve Example 4.

Figure 3. Comparison of three methods on number of iterations for Example 4.

Figure 4. Comparison of three methods on CPU time for Example 4.

Example 5

(Random problems in [33]). Consider

A = Q D Q^{T}

, where

Q = (I - 2 ω_{3} ω_{3}^{T}) (I - 2 ω_{2} ω_{2}^{T}) (I - 2 ω_{1} ω_{1}^{T}),

and

ω_{1}

,

ω_{2}

,

ω_{3}

are unitary random vectors,

D = d i a g (σ_{1}, \dots, σ_{n})

is a diagonal matrix where

σ_{1} = 1

,

σ_{n} = c o n d

, and

σ_{j}

is randomly generated between 1 and condition number

c o n d

for

j = 2, \dots, n - 1

. We set

b = 0

,

ϵ = 10^{- 8}

, and the initial point

x_{0} = {(1, \dots, 1)}^{T}

.

For Example 5, we set

n = 2000

and allowed a maximum of 10,000 iterations. In order to make a better comparison, we chose three other methods, which were ABBmin1 and ABBmin2 methods, and ABB method. The parameters used by the ABBmin1 and ABBmin2 methods were the same as Example 4. And for the ABB method, we set

κ = 0.15

, which was different from that in [17]. In the experiments, three values of the condition number cond:

10^{4}, 10^{5}, 10^{6}

were chosen. For each value of

c o n d

, ten instances with

σ_{j}

evenly distributed in

[1, c o n d]

were generated,

j = 2, \dots, n - 1

. The comparison of the number of iterations and the CPU time of several methods in solving Example 5 are shown in Figure 5 and Figure 6.

Figure 5. Comparison of four methods on number of iterations for Example 5.

Figure 6. Comparison of four methods on CPU time for Example 5.

In all tables, ‘iter’ represents the number of iterations and ‘

x^{*}

’ represents the optimal solution. And the vertical axis of each figure shows the percentage of problems solved by different methods within the minimum value of the metric factor

ρ

.

From Table 1 and Table 2, we can see that the new method has no obvious advantage in the number of iterations when solving the optimization problems like Examples 1 and 2 and the solution accuracy can reach the level of other methods. However, in terms of CPU time, we can see from Figure 1 and Figure 2 that the new method has a clear advantage over other compared methods. When solving optimization problems like Example 3, it is not difficult to see from Table 3 that the new method can complete well in terms of the number of iterations and the accuracy of the minimum points. And from Figure 3 and Figure 4, we can see that there is no significant difference in the number of iterations between the three methods when solving the problems such as Example 4, but the new method has a slight advantage in terms of the CPU time. For the random problems like Example 5, we can see from Figure 5 that the new method and ABBmin2 method perform better in terms of the number of iterations, while in terms of the CPU time, we can see from Figure 6 that the new method still has obvious advantages.

5. Conclusions

In this paper, we proposed a modified BB-like method which used the stepsize

α_{k}^{n e w}

and analyzed two cases when the the coefficient matrix A of the quadratic term of quadratic function is a three-order matrix. For the case,

A = [\begin{matrix} 1 \\ 1 \\ λ \end{matrix}], λ > 1

, we have proved the R-superlinear convergence of this case and generalized this case to

A = [\begin{matrix} μ \\ μ \\ φ \end{matrix}]

,

φ > μ \geq 1

. In addition to that, we have further generalized this case to the n-dimensional form, that is,

A = diag {λ_{1}, λ_{2}, \dots, λ_{n}}

, where

1 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{n}

. And we have proved the R-linear convergence of the n-dimensional case. The numerical experimental results have shown that this method has significant advantage in running time when comparing with some other methods. For another case

A = [\begin{matrix} 1 & 0 & 0 \\ 0 & λ & 0 \\ 0 & 1 & λ \end{matrix}]

,

λ > 1

, we also proved the global convergence of this case under some assumption and by the numerical results we can see that this modified method is fast and effective in dealing with problems. To sum up, using the modified stepsize

α_{k}^{n e w}

to solve three-dimensional optimization problems is well-behaved.

Author Contributions

Methodology, Q.H.; software, T.W.; supervision, Q.H.; writing—original draft, T.W.; writing—review and editing, T.W. and Q.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundation of China grant 12171196.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

There is no conflicts of interests.

References

Cauchy, A. Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris 1847, 25, 536–538. [Google Scholar]
Forsythe, G.E. On the asymptotic directions of the s-dimensional optimum gradient method. Numer. Math. 1968, 11, 57–76. [Google Scholar] [CrossRef]
Nocedal, J.; Sartenaer, A.; Zhu, C. On the behavior of the gradient norm in the steepest descent method. Comput. Optim. Appl. 2002, 22, 5–35. [Google Scholar] [CrossRef]
Barzilai, J.; Borwein, J.M. Two-point step size gradient methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
Dennis, J.E., Jr.; Moré, J.J. Quasi-Newton methods, motivation and theory. SIAM Rev. 1977, 19, 46–89. [Google Scholar] [CrossRef]
Birgin, E.G.; Martínez, J.M.; Raydan, M. Spectral projected gradient methods: Review and perspectives. J. Stat. Softw. 2014, 60, 1–21. [Google Scholar] [CrossRef]
Fletcher, R. On the Barzilai-Borwein method. In Optimization and Control with Applications; Qi, L.Q., Teo, K., Yang, X.Q., Eds.; Springer: New York, NY, USA, 2005; pp. 235–256. [Google Scholar]
Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Spectral properties of Barzilai-Borwein rules in solving singly linearly constrained optimization problems subject to lower and upper bounds. SIAM J. Optim. 2020, 30, 1300–1326. [Google Scholar] [CrossRef]
Huang, Y.K.; Dai, Y.H.; Liu, X.W. Equipping the Barzilai-Borwein method with the two dimensional quadratic termination property. SIAM J. Optim. 2021, 31, 3068–3096. [Google Scholar] [CrossRef]
Huang, Y.; Liu, H. On the rate of convergence of projected Barzilai-Borwein methods. Optim. Methods Softw. 2015, 30, 880–892. [Google Scholar] [CrossRef]
Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai-Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
Raydan, M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 1997, 7, 26–33. [Google Scholar] [CrossRef]
Grippo, L.; Lampariello, F.; Lucidi, S. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 1986, 23, 707–716. [Google Scholar] [CrossRef]
Dai, Y.H.; Fletcher, R. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 2005, 100, 21–47. [Google Scholar] [CrossRef]
Huang, Y.; Liu, H. Smoothing projected Barzilai-Borwein method for constrained non-Lipschitz optimization. Comput. Optim. Appl. 2016, 65, 671–698. [Google Scholar] [CrossRef]
Dai, Y.H. Alternate step gradient method. Optimization 2003, 52, 395–415. [Google Scholar] [CrossRef]
Zhou, B.; Gao, L.; Dai, Y.H. Gradient methods with adaptive step-sizes. Comput. Optim. Appl. 2006, 35, 69–86. [Google Scholar] [CrossRef]
Frassoldati, G.; Zanni, L.; Zanghirati, G. New adaptive stepsize selections in gradient methods. J. Ind. Manag. Optim. 2008, 4, 299–312. [Google Scholar] [CrossRef]
De Asmundis, R.; Di Serafino, D.; Hager, W.W.; Toraldo, G.; Zhang, H. An efficient gradient method using the Yuan steplength. Comput. Optim. Appl. 2014, 59, 541–563. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y.X. Analysis of monotone gradient methods. J. Ind. Manag. Optim. 2005, 1, 181–192. [Google Scholar] [CrossRef]
Broyden, C.G. A class of methods for solving nonlinear simultaneous equations. Math. Comput. 1965, 19, 577–593. [Google Scholar] [CrossRef]
Davidon, W.C. Variable metric method for minimization. SIAM J. Optim. 1991, 1, 1–17. [Google Scholar] [CrossRef]
Fletcher, R.; Powell, M.J.D. A rapidly convergent descent method for minimization. Comput. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
Broyden, C.G. The convergence of single-rank quasi-Newton methods. Math. Comput. 1970, 24, 365–382. [Google Scholar] [CrossRef]
Fletcher, R. A new approach to variable metric algorithms. Comput. J. 1970, 13, 317–322. [Google Scholar] [CrossRef]
Goldfrab, D. A family of variable-metric methods derived by variational means. Math. Comput. 1970, 24, 23–26. [Google Scholar] [CrossRef]
Shanno, D.F. Conditioning if quasi-Newton methods for function minimization. Math. Comput. 1970, 24, 647–656. [Google Scholar] [CrossRef]
Dai, Y.H.; Huang, Y.; Liu, X.W. A family of spectral gradient methods for optimization. Comput. Optim. Appl. 2019, 74, 43–65. [Google Scholar] [CrossRef]
Dai, Y.H.; Al-Baali, M.; Yang, X. A positive Barzilai-Borwein-like stepsize and an extension for symmetric linear systems. In Numerical Analysis and Optimization; Al-Baali, M., Grandientti, L., Purnama, A., Eds.; Springer: Cham, Switzerland, 2015; pp. 59–75. [Google Scholar]
Dai, Y.H.; Yang, X.Q. A new gradient method with an optimal stepsize property. Comput. Optim. Appl. 2006, 33, 73–88. [Google Scholar] [CrossRef]
Elman, H.C.; Golub, G.H. Inexact and preconditioned Uzawa algorithm for saddle point problems. SIAM J. Numer. Anal. 1994, 31, 1645–1661. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y.X. Alternate minimization gradient methods. IMA J. Numer. Anal. 2003, 23, 377–393. [Google Scholar] [CrossRef]
Friedlander, A.; Martínez, J.M.; Molina, B.; Raydan, M. Gradient method with retards and generalizations. SIAM J. Numer. Anal. 1999, 36, 275–289. [Google Scholar] [CrossRef]

Figure 1. Comparison of six methods on CPU time for Example 1.

Figure 2. Comparison of six methods on CPU time for Example 2.

Figure 3. Comparison of three methods on number of iterations for Example 4.

Figure 4. Comparison of three methods on CPU time for Example 4.

Figure 5. Comparison of four methods on number of iterations for Example 5.

Figure 6. Comparison of four methods on CPU time for Example 5.

Table 1. Number of iterations and minimum points of compared methods for Example 1.

λ	$α_{k}^{n e w}$	$α_{k}^{B B 1}$	$α_{k}^{B B 2}$
λ	iter ( $x^{*}$ )	iter ( $x^{*}$ )	iter ( $x^{*}$ )
3	7 ${(1.86 e - 8, 1.30 e - 8, 8.66 e - 7)}^{T}$	7 ${(4.79 e - 8, 3.52 e - 8, 1.31 e - 6)}^{T}$	7 ${(1.73 e - 9, 1.21 e - 9, 4.27 e - 7)}^{T}$
5	9 ${(1.48 e - 9, 1.04 e - 9, 2.57 e - 6)}^{T}$	9 ${(4.06 e - 9, 2.84 e - 9, 1.82 e - 5)}^{T}$	9 ${(5.91 e - 12, 4.14 e - 12, 1.06 e - 9)}^{T}$
10	12 ${(1.21 e - 5, 8.47 e - 6, - 8.80 e - 6)}^{T}$	14 ${(5.46 e - 11, 3.82 e - 11, - 0.0003)}^{T}$	9 ${(0.0002, 0.0001, 1.66 e - 6)}^{T}$
50	10 ${(8.43 e - 8, 5.90 e - 8, - 2.54 e - 6)}^{T}$	9 ${(0.0005, 0.0004, 2.34 e - 12)}^{T}$	10 ${(9.47 e - 9, 6.63 e - 9, - 1.44 e - 11)}^{T}$
100	10 ${(2.70 e - 15, 1.89 e - 15, - 1.41 e - 10)}^{T}$	7 ${(0.0011, 0.0008, 2.47 e - 8)}^{T}$	8 ${(0.0044, 0.0031, - 1.57 e - 14)}^{T}$
500	7 ${(1.12 e - 5, 7.84 e - 6, 1.18 e - 8)}^{T}$	7 ${(5.51 e - 8, 3.86 e - 8, - 3.55 e - 15)}^{T}$	7 ${(3.09 e - 7, 2.17 e - 7, 2.75 e - 9)}^{T}$
1000	7 ${(1.90 e - 7, 1.33 e - 7, 9.46 e - 11)}^{T}$	7 ${(5.53 e - 10, 3.87 e - 10, 3.55 e - 15)}^{T}$	7 ${(4.87 e - 9, 3.41 e - 9, 1.08 e - 11)}^{T}$
10,000	7 ${(1.98 e - 13, 1.39 e - 13, 2.78 e - 17)}^{T}$	7 ${(7.09 e - 17, 4.97 e - 17, 0)}^{T}$	7 ${(4.89 e - 15, 3.43 e - 15, - 1.73 e - 18)}^{T}$
λ	$α_{k}^{DY}$	$γk = 0.5$	$α_{k}^{MG}$
λ	iter ( $x^{*}$ )	iter ( $x^{*}$ )	iter ( $x^{*}$ )
3	7 ${(1.90 e - 8, 1.30 e - 8, 8.67 e - 7)}^{T}$	7 ${(1.87 e - 8, 1.31 e - 8, 8.69 e - 7)}^{T}$	11 ${(0.0007, 0.0005, 7.01 e - 5)}^{T}$
5	9 ${(1.47 e - 9, 1.00 e - 9, 2.50 e - 6)}^{T}$	9 ${(1.61 e - 9, 1.13 e - 9, 2.68 e - 6)}^{T}$	23 ${(0.0013, 0.0009, 0.0001)}^{T}$
10	12 ${(1.22 e - 5, 8.87 e - 6, - 8.80 e - 6)}^{T}$	12 ${(1.16 e - 5, 8.15 e - 6, - 1.38 e - 5)}^{T}$	25 ${(0.0014, 0.0010, 0.0001)}^{T}$
50	10 ${(8.42 e - 8, 6.00 e - 8, - 2.50 e - 6)}^{T}$	10 ${(4.91 e - 10, 3.44 e - 10, - 1.29 e - 6)}^{T}$	7 ${(0.0016, 0.0011, 0.0002)}^{T}$
100	10 ${(2.59 e - 15, 1.91 e - 15, - 1.43 e - 10)}^{T}$	10 ${(6.94 e - 18, 6.90 e - 18, - 2.13 e - 10)}^{T}$	7 ${(2.98 e - 5, 2.09 e - 5, 2.98 e - 6)}^{T}$
500	7 ${(1.20 e - 5, 7.88 e - 6, 1.20 e - 8)}^{T}$	7 ${(6.83 e - 5, 4.78 e - 5, 6.45 e - 10)}^{T}$	5 ${(3.52 e - 6, 2.46 e - 6, 3.52 e - 7)}^{T}$
1000	7 ${(1.89 e - 7, 1.50 e - 7, 9.50 e - 11)}^{T}$	7 ${(3.10 e - 6, 2.17 e - 6, 4.37 e - 12)}^{T}$	5 ${(2.21 e - 7, 1.55 e - 7, 2.21 e - 8)}^{T}$
10,000	7 ${(2.01 e - 13, 1.40 e - 13, 2.77 e - 17)}^{T}$	7 ${(1.29 e - 11, 9.05 e - 12, 0)}^{T}$	5 ${(1.49 e - 5, 1.04 e - 5, 1.49 e - 6)}^{T}$

Table 2. Number of iterations and minimum points of compared methods for Example 2.

$a, b$	$α_{k}^{n e w}$	$α_{k}^{B B 1}$	$α_{k}^{B B 2}$
$a, b$	iter ( $x^{*}$ )	iter ( $x^{*}$ )	iter ( $x^{*}$ )
$a = 2, b = 5$	7 ${(1.93 e - 5, 1.28 e - 5, 0.0002)}^{T}$	7 ${(4.79 e - 5, 3.19 e - 5, 0.0002)}^{T}$	7 ${(2.89 e - 6, 1.92 e - 6, 8.78 e - 5)}^{T}$
$a = 10, b = 16$	7 ${(1.57 e - 9, 1.04 e - 9, 6.55 e - 8)}^{T}$	7 ${(2.79 e - 9, 1.86 e - 9, 8.09 e - 8)}^{T}$	7 ${(6.73 e - 10, 4.49 e - 10, 5.00 e - 8)}^{T}$
$a = 25, b = 30$	5 ${(2.08 e - 5, 1.39 e - 5, 9.42 e - 9)}^{T}$	5 ${(2.08 e - 5, 1.38 e - 5, 1.12 e - 8)}^{T}$	5 ${(2.09 e - 5, 1.39 e - 5, 7.81 e - 9)}^{T}$
$a = 50, b = 120$	7 ${(9.10 e - 6, 6.07 e - 6, 8.78 e - 5)}^{T}$	7 ${(2.21 e - 5, 1.47 e - 5, 0.0001)}^{T}$	7 ${(1.51 e - 6, 1.01 e - 6, 5.00 e - 5)}^{T}$
$a = 100, b = 350$	10 ${(1.18 e - 5, 7.84 e - 6, - 1.13 e - 11)}^{T}$	10 ${(3.99 e - 5, 2.67 e - 5, - 6.16 e - 10)}^{T}$	9 ${(3.46 e - 7, 2.31 e - 7, 1.16 e - 6)}^{T}$
$a = 1000, b = 5000$	14 ${(3.98 e - 10, 2.65 e - 10, - 1.90 e - 7)}^{T}$	15 ${(2.68 e - 5, 1.79 e - 5, 4.35 e - 9)}^{T}$	12 ${(3.76 e - 17, 2.51 e - 17, - 4.77 e - 10)}^{T}$
a = 10,000, b = 15,000	7 ${(2.89 e - 10, 1.93 e - 10, 1.51 e - 8)}^{T}$	7 ${(4.83 e - 10, 3.22 e - 10, 1.81 e - 8)}^{T}$	7 ${(1.42 e - 10, 9.46 e - 11, 1.20 e - 8)}^{T}$
$a, b$	$α_{k}^{D Y}$	$γ_{k} = 0.5$	$α_{k}^{M G}$
$a, b$	iter ( $x^{*}$ )	iter ( $x^{*}$ )	iter ( $x^{*}$ )
$a = 2, b = 5$	7 ${(1.92 e - 5, 1.20 e - 5, 0.0001)}^{T}$	7 ${(1.95 e - 5, 1.30 e - 5, 0.0002)}^{T}$	12 ${(0.0004, 0.0003, - 0.0002)}^{T}$
$a = 10, b = 16$	7 ${(1.70 e - 9, 1.11 e - 9, 6.55 e - 8)}^{T}$	7 ${(1.57 e - 9, 1.05 e - 9, 6.55 e - 8)}^{T}$	7 ${(0.0001, 7.96 e - 5, 2.65 e - 5)}^{T}$
$a = 25, b = 30$	5 ${(2.08 e - 5, 1.30 e - 5, 9.50 e - 5)}^{T}$	5 ${(2.08 e - 5, 1.38 e - 5, 9.42 e - 9)}^{T}$	5 ${(2.76 e - 5, 1.84 e - 5, 6.14 e - 6)}^{T}$
$a = 50, b = 120$	7 ${(9.09 e - 6, 6.08 e - 6, 8.80 e - 5)}^{T}$	7 ${(9.21 e - 6, 6.14 e - 6, 8.82 e - 5)}^{T}$	13 ${(0.0001, 7.14 e - 5, 2.38 e - 5)}^{T}$
$a = 100, b = 350$	10 ${(1.11 e - 5, 7.88 e - 6, - 1.15 e - 11)}^{T}$	10 ${(1.30 e - 5, 8.68 e - 6, - 1.43 e - 11)}^{T}$	20 ${(0.0001, 7.26 e - 5, - 1.65 e - 5)}^{T}$
$a = 1000, b = 5000$	14 ${(4.00 e - 10, 2.65 e - 10, - 1.90 e - 7)}^{T}$	14 ${(1.04 e - 9, 6.93 e - 10, - 2.70 e - 7)}^{T}$	23 ${(4.49 e - 5, 3.00 e - 5, 9.99 e - 6)}^{T}$
a = 10,000, b = 15,000	7 ${(2.80 e - 10, 1.93 e - 10, 1.50 e - 8)}^{T}$	7 ${(2.90 e - 10, 1.93 e - 10, 1.51 e - 8)}^{T}$	8 ${(1.56 e - 6, 1.04 e - 6, - 3.01 e - 6)}^{T}$

Table 3. Number of iterations and minimum points of our method for Example 3.

$λ$	${(- 2, 2, 1)}^{T}$	${(1, 2, 0)}^{T}$	${(- 1, - 2, 1)}^{T}$
$λ$	iter ( $x^{*}$ )	iter ( $x^{*}$ )	iter ( $x^{*}$ )
3	11 ${(- 0.0001, 5.56 e - 6, 4.09 e - 6)}^{T}$	10 ${(2.69 e - 6, - 6.88 e - 7, 6.57 e - 7)}^{T}$	11 ${(- 8.69 e - 7, - 3.00 e - 6, - 1.29 e - 6)}^{T}$
5	14 ${(- 2.46 e - 7, - 1.64 e - 6, - 2.08 e - 6)}^{T}$	14 ${(5.38 e - 7, - 1.47 e - 6, - 1.62 e - 6)}^{T}$	10 ${(- 0.0002, 1.73 e - 5, - 1.54 e - 5)}^{T}$
10	11 ${(- 5.39 e - 6, 2.85 e - 6, 2.91 e - 6)}^{T}$	14 ${(3.01 e - 7, - 9.89 e - 6, - 4.34 e - 6)}^{T}$	12 ${(- 7.13 e - 9, 2.04 e - 5, - 2.02 e - 5)}^{T}$
50	13 ${(- 1.46 e - 5, 4.94 e - 8, 5.23 e - 8)}^{T}$	14 ${(2.88 e - 7, 1.56 e - 6, - 2.01 e - 6)}^{T}$	13 ${(- 2.20 e - 6, - 1.72 e - 8, - 1.64 e - 8)}^{T}$
100	13 ${(- 4.42 e - 6, 2.84 e - 8, - 2.33 e - 8)}^{T}$	14 ${(5.92 e - 7, 1.57 e - 6, - 4.91 e - 6)}^{T}$	13 ${(- 3.87 e - 6, - 8.53 e - 9, - 3.62 e - 9)}^{T}$
500	13 ${(- 3.44 e - 7, 4.65 e - 9, - 4.60 e - 9)}^{T}$	9 ${(0.0022, 2.75 e - 8, - 6.26 e - 10)}^{T}$	13 ${(- 5.49 e - 6, - 9.12 e - 9, 7.72 e - 9)}^{T}$
1000	13 ${(- 1.86 e - 7, 1.53 e - 9, - 1.52 e - 9)}^{T}$	9 ${(0.0022, 6.86 e - 9, - 7.83 e - 11)}^{T}$	13 ${(- 5.72 e - 6, - 1.00 e - 8, 9.38 e - 9)}^{T}$
10,000	13 ${(- 8.86 e - 8, 9.85 e - 14, 3.86 e - 12)}^{T}$	8 ${(0.0022, - 1.17 e - 10, - 1.37 e - 6)}^{T}$	13 ${(- 5.93 e - 6, - 1.10 e - 8, 1.10 e - 8)}^{T}$
$λ$	${(3, 2, 1)}^{T}$	${(9, 7, 0)}^{T}$
$λ$	iter ( $x^{*}$ )	iter ( $x^{*}$ )
3	11 ${(2.57 e - 5, - 5.13 e - 6, 7.03 e - 6)}^{T}$	12 ${(3.33 e - 9, - 3.17 e - 5, - 1.54 e - 5)}^{T}$
5	14 ${(2.44 e - 6, - 5.61 e - 6, - 6.03 e - 6)}^{T}$	13 ${(1.06 e - 5, 2.10 e - 5, 2.19 e - 5)}^{T}$
10	14 ${(2.98 e - 7, - 2.91 e - 6, - 3.93 e - 6)}^{T}$	14 ${(1.76 e - 6, - 2.42 e - 5, - 1.14 e - 5)}^{T}$
50	13 ${(2.40 e - 8, - 2.32 e - 8, 4.86 e - 8)}^{T}$	14 ${(2.59 e - 5, - 7.07 e - 6, - 1.86 e - 6)}^{T}$
100	13 ${(1.77 e - 6, - 1.35 e - 8, 4.00 e - 8)}^{T}$	14 ${(1.72 e - 5, - 2.84 e - 7, - 1.33 e - 7)}^{T}$
500	13 ${(6.29 e - 7, 7.88 e - 9, - 7.76 e - 9)}^{T}$	13 ${(1.33 e - 7, 3.19 e - 7, 3.65 e - 7)}^{T}$
1000	13 ${(1.59 e - 7, 4.25 e - 9, - 4.23 e - 9)}^{T}$	13 ${(2.08 e - 6, 6.67 e - 7, 8.73 e - 7)}^{T}$
10,000	8 ${(0.0093, - 1.90 e - 9, - 2.01 e - 9)}^{T}$	8 ${(0.0089, - 6.99 e - 9, - 2.14 e - 7)}^{T}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.