Next Article in Journal
Digital Twin-Empowered Robotic Arm Control: An Integrated PPO and Fuzzy PID Approach
Previous Article in Journal
An Adaptive Large Neighborhood Search for a Green Vehicle Routing Problem with Depot Sharing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Three-Dimensional Extension of Barzilai-Borwein-like Method

by
Tianji Wang
and
Qingdao Huang
*,†
School of Mathematics, Jilin University, Changchun 130012, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2025, 13(2), 215; https://doi.org/10.3390/math13020215
Submission received: 5 December 2024 / Revised: 31 December 2024 / Accepted: 9 January 2025 / Published: 10 January 2025
(This article belongs to the Section E: Applied Mathematics)

Abstract

:
The Barzilai-Borwein (BB) method usually uses BB stepsize for iteration so as to eliminate the line search step in the steepest descent method. In this paper, we modify the BB stepsize and extend it to solve the optimization problems of three-dimensional quadratic functions. The discussion is divided into two cases. Firstly, we study the case where the coefficient matrix of the quadratic term of quadratic function is a special third-order diagonal matrix and prove that using the new modified stepsize, this case is R-superlinearly convergent. In addition to that, we extend it to n-dimensional case and prove the rate of convergence is R-linear. Secondly, we analyze that the coefficient matrix of the quadratic term of quadratic function is a third-order asymmetric matrix, that is, when the matrix has a double characteristic root and prove the global convergence of this case. The results of numerical experiments show that the modified method is effective for the above two cases.

1. Introduction

In this paper, we consider the unconstrained optimization problem of minimizing the quadratic function,
min x R n f ( x ) = 1 2 x T A x b T x ,
where A R n × n is the coefficient matrix of the quadratic term, b R n . In order to solve (1), common optimization methods usually take the following iterative approach:
x k + 1 = x k α k g k ,
where g k = f ( x k ) , α k > 0 is called stepsize. Different method has different definition of the stepsize, so the studies on stepsize are diverse. The most common method is the classical steepest descent method [1], whose stepsize is called Cauchy stepsize,
α k S D = arg min α > 0 f ( x k α g k ) ,
this method to find the stepsize is also called accurate one-dimensional line search, and Forsythe proved the rate of convergence of the classical steepest descent is linear in [2]. Although the stepsize α k S D is effective, the classical steepest descent method will not work well when the condition number of A is large, see [3] for details.
In order to ensure the convergence speed and reduce the amount of computation, Borwein and Barzilai [4] proposed a new stepsize, they turned the iterative formula into
x k + 1 = x k D k g k ,
where D k = α k I , I is identity matrix. It is similar to the quasi-Newton method [5], D k 1 can be regarded as an approximate Hessian matrix of f at x k . And then, in order for D k to have quasi-Newton property, they chose α k so that D k meets the following condition:
D k = arg min D = α I D 1 s k 1 y k 1 ,
and they have
α k B B 1 = s k 1 T s k 1 s k 1 T y k 1 ,
where s k 1 = x k x k 1 , y k 1 = g k g k 1 and · denotes the Euclidean norm. Additionally, in another way to choose the stepsize α k , they let D k satisfy
D k = arg min D = α I s k 1 D y k 1 ,
and then they have another stepsize
α k B B 2 = s k 1 T y k 1 y k 1 T y k 1 .
As we can see, if s k 1 T y k 1 > 0 , then α k B B 1 α k B B 2 , so α k B B 1 is a long stepsize and α k B B 2 is a short stepsize. So α k B B 1 will perform better than α k B B 2 in solving some optimization problems, see [6,7] for details. And the methods that use α k B B 1 or α k B B 2 as stepsize are collectively referred to as BB methods.
In recent years, there has been a lot of research on the convergence and stepsize modification of the BB methods [8,9,10]. From previous studies, it can be found that the convergence rate of BB methods are usually R-superlinear and R-linear. For example, in [4], Barzilai and Borwein proved R-superlinear convergence of their method with the stepsize α k B B 2 for solving two-dimensional strictly convex quadratics. And Dai and Liao [11] proved R-linear convergence of the BB method for n-dimensional strictly convex quadratics. Now we give the definition of these two rates of convergence as follows.
Definition 1. 
Set
R 1 = lim   sup k x k x * 1 / k
According to the above formula, the R-convergence rate can be divided into two cases:
(i) When R 1 = 0 , the sequence of iterated points { x k } is said to have R-superlinear convergence rate;
(ii) When 0 < R 1 < 1 , the sequence of iterated points { x k } is said to have R-linear convergence rate.
In addition to solving quadratics, the BB method can also solve nonlinear optimization problems. Raydan [12] proposed a global Barzilai-Borwein method for unconstrained optimization problems by combining with the nonmonotone line search proposed by Grippo et al. [13]. Dai and Fletcher [14] developed projected BB methods for solving large-scale box-constrained quadratic programming. Additionally, Huang and Liu [15] extended the projected BB methods by using smoothing techniques, and they modified it to solve non-Lipschitz optimization problems. In [16], Dai considered alternating the Cauchy stepsize and the BB stepsize, and proposed an alternate step gradient method. And in [17], Zhou et al. proposed an Adaptive Barzilai-Borwein (ABB) method which alternated α k B B 1 and α k B B 2 .
In addition, the relationship between BB stepsizes and the spectrum of the Hessian matrices of the objective function has also attracted wide attention. Based on the ABB method in [17], Frassoldati et al. [18] tried to use α k B B 1 close to the reciprocal of the minimum eigenvalue of the Hessian matrix. Their first implementation of this idea was denoted by ABBmin1 and in order to better the iteration effect, they proposed another method, denoted by ABBmin2. De Asmundis et al. [19] used the spectral property of the stepsize in [20] to propose an SDC method, the SDC indicated that the Cauchy stepsize α k S D was alternated with the Constant one. In [21], the Broyden class of quasi-Newton method approximates the inverse of the Hessian matrix by
H k τ = τ H k B F G S + ( 1 τ ) H k D F P ,
where τ [ 0 , 1 ] , H k B F G S and H k D F P are the BFGS and DFP matrices satisfying the formula H k y k 1 = s k 1 , respectively. In the quasi-Newton method, these are the two most common corrections for H k . Among them, the DFP correction was first proposed by Daviden [22] and later explained and developed by Fletcher and Powell [23], while the BFGS correction was summarized from the quasi-Newton method proposed by Broyden, Fletcher, Goldfarb and Shanno independently in 1970 [24,25,26,27]. Similarly, applying this idea to the BB method, Dai et al. [28] solved the following equation
min D = α I τ ( D 1 s k 1 y k 1 ) + ( 1 τ ) ( s k 1 D y k 1 )
to obtain the convex combination of α k B B 1 and α k B B 2
α k = γ k α k B B 1 + ( 1 γ k ) α k B B 2 ,
where γ k [ 0 , 1 ] , and they further proved that the family of spectral gradient methods of (12) have R-superlinear convergence for two-dimensional strictly convex quadratics.
In addition to the several stepsize definitions mentioned above, there are also some BB-like stepsize. In [29], Dai et al. set A = 1 λ and b = 0 , where λ > 1 , they obtained a positive BB-like stepsize by averaging α k B B 1 and α k B B 2 geometrically as follows
α k = α k B B 1 · α k B B 2 ,
whose simplification is equivalent to
α k = g k 1 A g k 1 ,
and they proved the R-superlinear convergence of the method. In addition, (14) can also be seen as a delay extension of the stepsize proposed by Dai and Yang in [30],
α k D Y = g k A g k .
Interestingly, it has been shown in [30] that (15) will eventually approach the minimum value of I α A , precisely
lim   inf k α k D Y = 2 λ 1 + λ n ,
where λ 1 and λ n are the minimum and maximum eigenvalues of A, and their corresponding eigenvectors are g k g k + g k + 1 g k + 1 and g k g k g k + 1 g k + 1 , respectively. The minimum stepsize of (16) is the optimal stepsize in [31], i.e.,
α k O P T = arg min α R 1 I α A , α k O P T = 2 λ 1 + λ n .
In this paper, we mainly research on the three-dimensional cases. That is to say, the coefficient matrix A of the quadratic term is a third-order matrix. In [29], their BB-like method applied only to A = 1 λ , λ > 1 . Based on it, we modify the stepsize in (14) as follows:
α k n e w = g k 1 A T + A 2 g k 1 ,
and make it applicable to both cases A = 1 1 λ and A = 1 0 0 0 λ 0 0 1 λ , λ > 1 . For the case A = 1 1 λ , λ > 1 , we generalize it to a more general form which is A = μ μ φ , where φ > μ 1 . For these two cases, we have carried out the proof of convergence and numerical experiments.
The paper is organized as follows. In Section 2, we analyze the new BB-like method which uses the stepsize α k n e w for the case of A = 1 1 λ , λ > 1 and A = μ μ φ , φ > μ 1 , we prove the rate of convergence of the new method is R-superlinear. Additionally, we extend this case to n-dimension, which means that A = diag { λ 1 , λ 2 , , λ n } , where 1 = λ 1 λ 2 λ n . And we prove the rate of convergence in the n-dimensional case is R-linear. Section 3 provides the research of the case A = 1 0 0 0 λ 0 0 1 λ , λ > 1 , and we prove the global convergence of this case under some assumption. In Section 4, we give some numerical experiment results to show the effectiveness of the new method. Finally, the conclusions are given in Section 5.

2. The Case Where A Is a Diagonal Matrix and Its Convergence Analysis

2.1. Three-Dimensional Case

In this section, we start with the basic case that
A = 1 1 λ , b = 0 ,
where λ > 1 . According to the iteration formula x k + 1 = x k α k g k , we assume that x 1 , x 2 are given initial iteration points and they satisfy
g 1 ( i ) 0 , g 2 ( i ) 0 , i = 1 , 2 , 3 .
As we know,
α k n e w = g k 1 A T + A 2 g k 1 ,
so we have
g k 1 = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 , A T + A 2 g k 1 = A g k 1 = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + λ g k 1 ( 3 ) 2 , α k n e w = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + λ g k 1 ( 3 ) 2 .
we set
p k = g k ( 1 ) 2 + g k ( 2 ) 2 g k ( 3 ) 2 ,
then α k n e w can be written as
α k n e w = 1 + p k 1 λ 2 + p k 1 .
Notice that g k = A x k so the iteration formula for g k is
g k + 1 = ( I α k n e w A ) g k .
According to (23), we have
g k + 1 ( 1 ) g k + 1 ( 2 ) g k + 1 ( 3 ) = 1 1 1 1 + p k 1 λ 2 + p k 1 1 1 λ g k ( 1 ) g k ( 2 ) g k ( 3 ) = λ 2 + p k 1 1 + p k 1 λ 2 + p k 1 λ 2 + p k 1 1 + p k 1 λ 2 + p k 1 λ 2 + p k 1 λ 1 + p k 1 λ 2 + p k 1 g k ( 1 ) g k ( 2 ) g k ( 3 ) ,
which is equivalent to
g k + 1 ( 1 ) = 1 1 + p k 1 λ 2 + p k 1 g k ( 1 ) ; g k + 1 ( 2 ) = 1 1 + p k 1 λ 2 + p k 1 g k ( 2 ) ; g k + 1 ( 3 ) = 1 λ 1 + p k 1 λ 2 + p k 1 g k ( 3 ) .
From (21) and (24), we can obtain
p k + 1 = g k + 1 ( 1 ) 2 + g k + 1 ( 2 ) 2 g k + 1 ( 3 ) 2 = 1 1 + p k 1 λ 2 + p k 1 2 g k ( 1 ) 2 + g k ( 2 ) 2 1 λ 1 + p k 1 λ 2 + p k 1 2 g k ( 3 ) 2 = 1 1 + p k 1 λ 2 + p k 1 2 1 λ 1 + p k 1 λ 2 + p k 1 2 p k = λ 2 + p k 1 1 + p k 1 λ 2 + p k 1 λ 1 + p k 1 2 p k = ( λ 2 + p k 1 1 + p k 1 ) ( λ 2 + p k 1 + λ 1 + p k 1 ) ( λ 2 1 ) p k 1 2 p k = λ p k 1 + τ ( p k 1 ) λ + 1 2 p k p k 1 2 ,
where τ is a quadratic function
τ ( ν ) = ( 1 + ν ) ( λ 2 + ν ) , ν > 0 .
Let
h ( ν ) = λ ν + τ ( ν ) λ + 1 , ν > 0 ,
then
p k + 1 = ( h ( p k 1 ) ) 2 p k p k 1 2 .
We define W k = log p k , by (28) we have
W k + 1 = W k 2 W k 1 + 2 log h ( p k 1 ) .
In order to prove the R-superlinear convergence of this case, we give three Lemma firstly.
Lemma 1. 
Assume that λ > 1 , the function h ( ν ) is monotonically increasing and h ( ν ) 2 λ λ + 1 , λ + 1 2 when ν ( 0 , + ) .
Proof. 
The proof of Lemma 1 can fully refer to the proof of Lemma 1.2.1 in [29].
According to (26), we have
τ = ( 1 + ν ) + ( λ 2 + ν ) ,
and
( τ ) 2 4 τ = ( λ 2 1 ) 2 .
By direct calculation, we can obtain
h ( ν ) = 1 + 1 2 τ 1 2 τ λ + 1 = ( τ ) 2 4 τ 2 ( λ + 1 ) τ 1 2 ( τ + 2 τ 1 2 ) = ( λ 2 1 ) 2 2 ( λ + 1 ) ( τ 1 2 τ + 2 τ ) .
Since ν 0 , we have h ( ν ) > 0 . So h ( ν ) is monotonically increasing and when ν = 0 , we have
h ( 0 ) = 2 λ λ + 1 .
In addition to that, we have
lim ν h ( ν ) = λ + 1 2 .
So in summary, we obtain that when ν 0 , h ( ν ) 2 λ λ + 1 , λ + 1 2 . □
In the next Lemma we will give the definition of ζ k and its lower bound.
Lemma 2. 
We define ζ k = W k + ( γ 1 ) W k 1 , where γ satisfies γ 2 γ + 2 = 0 , if
| ζ 2 |   > 8 log λ + 1 2 ,
then there exists c 1 > 0 , such that
| ζ k |   > ( 2 1 ) 2 k 2 c 1 , k 2 .
Proof. 
According to the definition of ζ k and (29),
ζ k + 1 = γ W k + W k + 1 W k = γ W k 2 W k 1 + 2 log h ( p k 1 ) = γ ζ k + 2 log h ( p k 1 ) .
Notice that | γ | = 2 , and from Lemma 1 we have log h ( p k 1 ) < log λ + 1 2 , so
| ζ k + 1 | > 2 | ζ k | c 1 ,
where c 1 = 2 log λ + 1 2 . By (32), we can obtain
| ζ k + 1 | 2 k 1 2 | ζ 2 | 2 k 2 1 2 1 c 1 2 k + 3 2 2 k 2 1 2 1 c 1 = [ ( 2 1 ) ( 2 k 2 + 1 ) + 2 ] c 1 > ( 2 1 ) 2 k 2 c 1 ,
so we finish the proof. □
By the Lemma 2 we know γ = 1 ± 7 i 2 and | γ 1 | = 2 , so
| ζ k | | W k | + 2 | W k 1 | ( 2 + 1 ) max { | W k | , | W k 1 | } ,
combining it with (33) we can obtain
max { | W k | , | W k 1 | } 1 2 + 1 ( 2 1 ) 2 k 2 c 1 = ( 2 1 ) 2 2 k 2 c 1 .
Lemma 3. 
Under the conditions of Lemma 2, we have
max 1 i 3 W k + i ( 2 1 ) 2 2 k 2 c 1 4 log λ + 1 2 , k 2 ;
min 1 i 3 W k + i ( 2 1 ) 2 2 k 2 c 1 + 4 log λ + 1 2 , k 2 .
Proof. 
By (34), if
W k 1 ( 2 1 ) 2 2 k 2 c 1
or
W k ( 2 1 ) 2 2 k 2 c 1
holds, thus (35) holds. Now, we consider other cases. We assume that the above two inequalities are not true, so we have
W k 1 ( 2 1 ) 2 2 k 2 c 1
or
W k ( 2 1 ) 2 2 k 2 c 1 ,
and by (29) we know that
W k + 2 = W k + 1 2 W k + 2 log h ( p k ) = W k 2 W k 1 + 2 log h ( p k 1 ) + 2 log h ( p k ) .
Next, we prove (35) in two cases.
Case (i). When W k 1 ( 2 1 ) 2 2 k 2 c 1 , if W k < 0 , we can obtain from (37)
W k + 2 2 W k 1 + 2 log h ( p k 1 ) ( 2 1 ) 2 2 k 2 c 1 2 c 1 ;
and if W k > 0 , by (29) we have
W k + 1 2 W k 1 + 2 log h ( p k 1 ) ( 2 1 ) 2 2 k 2 c 1 2 c 1 .
Case (ii). When W k ( 2 1 ) 2 2 k 2 c 1 , if W k + 1 < 0 , we can obtain from (37)
W k + 3 2 W k + 2 log h ( p k ) ( 2 1 ) 2 2 k 2 c 1 2 c 1 ;
and if W k + 1 > 0 , by (29) we have
W k + 2 2 W k + 2 log h ( p k ) ( 2 1 ) 2 2 k 2 c 1 2 c 1 .
From the proof of Lemma 2, we know that c 1 = 2 log λ + 1 2 . Therefore, according to the above analysis, we can prove that (35) is true and the proof of (36) is similar to it. □
In the following theorem, we will prove the rate of convergence of this case is R-superlinear.
Theorem 1. 
Assume that (19) and (21) hold, the sequence of gradient norms { g k } converges to zero and it converges R-superlinearly.
Proof. 
Notice that α k n e w ( 1 λ , 1 ) and g k + 1 = ( I α k n e w A ) g k , so we can obtain
| g k + 1 ( i ) | ( λ 1 ) | g k ( i ) | ,
where i = 1 , 2 , 3 and k 0 .
Firstly, we consider the third component of the gradient. From (24) we know
| g k + 1 ( 3 ) | | λ 2 + p k 1 λ 1 + p k 1 λ 2 + p k 1 | | g k ( 3 ) | ( λ 2 1 ) p k 1 λ 2 + p k 1 ( λ 2 + p k 1 + λ 1 + p k 1 ) | g k ( 3 ) | ( λ 2 1 ) p k 1 2 λ 2 | g k ( 3 ) | < ( λ 1 ) p k 1 | g k ( 3 ) | .
Combining (38) and (39), we obtain
| g k + 5 ( 3 ) |   ( λ 1 ) 5 ( min 1 j 3 p k + j ) | g k ( 3 ) | .
Since W k = log p k , so by using (36) we have
| g ( k + 5 ) ( 3 ) |   ( λ 1 ) 5 exp ( 2 1 ) 2 2 k 2 c 1 + 4 log λ + 1 2 | g k ( 3 ) | .
Similarly, as for the first component of the gradient, we calculate directly,
| g k + 1 ( 1 ) | | λ 2 + p k 1 1 + p k 1 λ 2 + p k 1 | | g k ( 1 ) | λ 2 1 λ 2 + p k 1 ( λ 2 + p k 1 + 1 + p k 1 ) | g k ( 1 ) | λ 2 1 p k 1 p k 1 λ 2 p k 1 + 1 + 1 p k 1 + 1 | g k ( 1 ) | λ 2 1 2 p k 1 | g k ( 1 ) | .
By (38) and W k = log p k , we can obtain
| g k + 5 ( 1 ) | 1 2 ( λ + 1 ) ( λ 1 ) 5 max 1 j 3 1 p k + j | g k ( 1 ) | 1 2 ( λ + 1 ) ( λ 1 ) 5 exp ( 2 1 ) 2 2 k 2 c 1 + 4 log λ + 1 2 | g k ( 1 ) | .
And from (24) we know that the condition of the second component of the gradient is the same as the first component, so
| g k + 5 ( 2 ) |   1 2 ( λ + 1 ) ( λ 1 ) 5 exp ( 2 1 ) 2 2 k 2 c 1 + 4 log λ + 1 2 | g k ( 2 ) |
can be obtained.
Finally, by (41)–(43), for any k,
g k + 5   1 2 ( λ + 1 ) ( λ 1 ) 5 exp ( 2 1 ) 2 2 k 2 c 1 + 4 log λ + 1 2 g k ,
so the sequence { g k } converges to zero R-superlinearly. □
On the basis of the above conclusion we generalize this case to a more general form, we set A = μ μ φ , φ > μ 1 . According to the assumptions and conditions mentioned above, we have
α k n e w = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 μ g k 1 ( 1 ) 2 + μ g k 1 ( 2 ) 2 + φ g k 1 ( 3 ) 2 .
Substituting p k of (21) into the above equation, then
α k n e w = 1 + p k 1 φ 2 + μ 2 p k 1 .
According to (23), we have
g k + 1 ( 1 ) = 1 μ 1 + p k 1 φ 2 + μ 2 p k 1 g k ( 1 ) ; g k + 1 ( 2 ) = 1 μ 1 + p k 1 φ 2 + μ 2 p k 1 g k ( 2 ) ; g k + 1 ( 3 ) = 1 φ 1 + p k 1 φ 2 + μ 2 p k 1 g k ( 3 ) .
Using the calculation method of (25), we can obtain
p k + 1 = φ μ p k 1 + η ( p k 1 ) φ + μ 2 p k p k 1 2 ,
where η is a quadratic function
η ( ω ) = ( 1 + ω ) ( φ 2 + μ 2 ω ) , ω > 0 .
Let
θ ( ω ) = φ μ ω + η ( ω ) φ + μ , ω > 0 ,
then
p k + 1 = ( θ ( p k 1 ) ) 2 p k p k 1 2 .
So we obtain something similar to (28), and by using the same proof method as Lemmas 1–3 and Theorem 1, we can also prove that when A = μ μ φ , φ > μ 1 , the BB-like method using the new stepsize is convergent and the rate of convergence is R-superlinear.

2.2. n-Dimensional Case

In this case, we consider that
A = diag { λ 1 , λ 2 , , λ n } , b = 0 ,
where 1 = λ 1 λ 2 λ n . Then, we will prove R-linear convergence of the new method for n-dimensional case.
In [16], Dai has proved that if A = diag { λ 1 , λ 2 , , λ n } , where 1 = λ 1 λ 2 λ n , and the stepsize α k has the following Property 1, then either g k = 0 for some finite k or the sequence of gradient norms { g k } converges to zero R-linearly.
Firstly, we give the Property 1 and the Theorem in [16]. In the Property 1, they define g k ( i ) is the ith component of g k and
G ( k , l ) = i = 1 l ( g k ( i ) ) 2 .
Property 1 
([16]). Suppose that there exist an integer m and positive constants M 1 > λ 1 and M 2 such that
(i) λ 1 α k 1 M 1 ;
(ii) for any integer l [ 1 , n 1 ] and ϵ > 0 , if G ( k j , l ) ϵ and ( g k j ( l + 1 ) ) 2 M 2 ϵ hold for j [ 0 , min { k , m } 1 ] , then α k 1 2 3 λ l + 1 .
Theorem 2 
([16]). Consider the linear system
A x = b , A R n × n , b R n ,
where A = d i a g ( λ 1 , λ 2 , , λ n ) , and 1 = λ 1 λ 2 λ n . Consider the gradient method where the stepsize α k has Property 1. Then either g k = 0 for some finite k, or the sequence { g k } converges to zero R-linearly.
Proof. 
The proof can fully refer to the proof of Theorem 4.1 in [16].
By (23), we have
g k + 1 ( i ) = ( 1 α k λ i ) g k ( i ) .
Denote δ 1 = max { ( 1 ( λ 1 / M 1 ) ) 2 , 1 / 4 } ( 0 , 1 ) and δ 2 = max { ( 1 ( M 1 / λ 1 ) ) 2 , 2 } . Then by (45) and the definition of G ( k , l ) , we can obtain that for all k 1 ,
G ( k + 1 , 1 ) δ 1 G ( k , l ) ,
g k + 1 ( i ) 2 δ 2 g k ( i ) 2 , for i = 1 , 2 , , n ,
g k + 1 2 δ 2 g k 2 .
The rest of the proof will be divided into three parts as follows:
(I). We prove that, for any integer 1 l < n and k 1 , if there exist some ϵ l ( 0 , M 2 1 ) and integer m l such that
G ( k + j , l ) ϵ l g k 2 , for all j m l ,
then we must have
g k + j 0 ( l + 1 ) 2 M 2 ϵ l g k 2 , for some j 0 [ m l , m l + m + Δ l + 1 ] ,
where
Δ l = log ( M 2 ϵ l δ 2 ( m l + m ) ) log δ 1 .
In fact, suppose that
g k + j ( l + 1 ) 2 > M 2 ϵ l g k 2 , for j [ m l , m l + m + Δ l ] .
Then we have from (49), (51) and Property 1 (ii) that
α k + j ( 1 ) 2 3 λ l + 1 , for j [ m l + m , m l + m + Δ l ] .
By (45), (52) and Property 1 (i), we can obtain
g k + j + 1 ( l + 1 ) 2 δ 1 g k + j ( l + 1 ) 2 , for j [ m l + m , m l + m + Δ l ] .
And from (47), (53) and the definition of Δ l , we obtain that
g k + m l + m + Δ l + 1 ( l + 1 ) 2 δ 1 Δ l + 1 g k + m l + m ( l + 1 ) 2 δ 1 Δ l + 1 δ 2 m l + m g k ( l + 1 ) 2 M 2 ϵ l g k 2 .
So (50) must hold.
(II). Denoting m l + 1 = m l + m + Δ l + 1 and ϵ l + 1 = ( 1 + M 2 δ 2 m ) ϵ l , we prove that if (49) holds, we can further have
G ( k + j , l + 1 ) ϵ l + 1 g k 2 , for all j m l + 1 .
In fact, by (I), we know that there are infinitely many integers j 1 and j 2 with j 2 > j 1 j 0 such that
g k + j ( l + 1 ) 2 M 2 ϵ l g k 2 , for j = j 1 , j 2 ,
and
g k + j ( l + 1 ) 2 > M 2 ϵ l g k 2 , for j [ j 1 + 1 , j 2 1 ] .
Then we have from (47) and (55) that
g k + j ( l + 1 ) 2 δ 2 m g k + j 1 ( l + 1 ) 2 M 2 δ 2 m ϵ l g k 2 , for j [ j 1 + 1 , j 1 + m ] .
If j 2 > j 1 + m , by Property (A), (45), (49) and (56), we have
α k + j 1 2 3 λ l + 1 and g k + j + 1 ( l + 1 ) 2 δ 1 g k + j ( l + 1 ) 2 , for j [ j 1 + m , j 2 1 ] .
It follows from (48), (55) and (58) that
g k + j ( l + 1 ) 2 < g k + j 1 + m ( l + 1 ) 2 M 2 δ 2 m ϵ l g k 2 , for j [ j 1 + m + 1 , j 2 ] .
Due to the arbitrariness of j 1 and j 2 , (57) and (59), we know that the following inequality holds for any j j 0 :
g k + j ( l + 1 ) 2 M 2 δ 2 m ϵ l g k 2 .
Since j 0 m l + 1 , then can obtain from (49), (60) and the definition of G ( k , l ) that (54) holds.
(III). Denoting for any 1 l n ,
ϵ l = 1 4 ( 1 + M 2 δ 2 m ) ( l n ) ,
and setting m 1 = log ϵ 1 / log δ 1 , m l + 1 = m l + m + Δ l + 1 for l = 1 , , n 1 and M = m n , we prove by induction that for all 1 l n ,
G ( k + j , l ) ϵ l g k 2 , for all j m l .
In fact, by (46) and the definition of m 1 , (62) holds clearly holds for l = 1 . Suppose that (62) is true for some 1 l n 1 . Then by (II), we know that (62) holds for l + 1 . Thus by induction, we know that (62) holds for all 1 l n . Notice that ϵ n = 1 / 4 and G ( k , n ) = g k 2 . It follows from (62) that
g k + M 2   1 4 g k 2 .
Since M = m n depends only on λ 1 , M 1 and M 2 , then we can obtain by (48) and (63) that the sequence { g k } converges to zero R-linearly. □
According to the Property and the Theorem given above, we will prove that the stepsize α k n e w satisfies Property 1 and the n-dimensional case has R-linear convergence rate in the following Theorem.
Theorem 3. 
If A = diag { λ 1 , λ 2 , , λ n } , where 1 = λ 1 λ 2 λ n , then either g k = 0 for some finite k or the sequence of gradient norms { g k } converges to zero R-linearly.
Proof. 
Firstly, we let M 1 = λ n and M 2 = 2 . When A is a symmetric positive matrix, we can obtain from (6) and (8) that
α k B B 1 = g k 1 T g k 1 g k 1 T A g k 1 , α k B B 2 = g k 1 T A g k 1 g k 1 T A 2 g k 1 .
And from (18), we have
α k n e w = g k 1 A T + A 2 g k 1 = g k 1 A g k 1 = α k B B 1 · α k B B 2 .
So, the following formula holds
α k B B 2 α k n e w α k B B 1 .
Then,
( α k n e w ) 1 1 α k B B 1 = i = 1 n λ i ( g k 1 ( i ) ) 2 i = 1 n ( g k 1 ( i ) ) 2 λ 1 .
Similarly,
( α k n e w ) 1 1 α k B B 2 = i = 1 n λ i 2 ( g k 1 ( i ) ) 2 i = 1 n λ i ( g k 1 ( i ) ) 2 λ n .
So, (i) of Property 1 holds.
If G ( k j , l ) ϵ and ( g k j ( l + 1 ) ) 2 M 2 ϵ hold for any integer l [ 1 , n 1 ] , ϵ > 0 and j [ 0 , min { k , m } 1 ] , we have
( α k n e w ) 1 1 α k B B 1 = i = 1 n λ i ( g k 1 ( i ) ) 2 i = 1 n ( g k 1 ( i ) ) 2 λ l + 1 i = l + 1 n ( g k 1 ( i ) ) 2 ϵ + i = l + 1 n ( g k 1 ( i ) ) 2 M 2 M 2 + 1 λ l + 1 = 2 3 λ l + 1 .
For the second inequality, we define X = i = l + 1 n ( g k 1 ( i ) ) 2 and F ( X ) = X ϵ + X . Obviously, the function F ( X ) is monotonically increasing when X > 0 . According to the assumption, we have i = l + 1 n ( g k 1 ( i ) ) 2 M 2 ϵ which means that X M 2 ϵ > 0 . So we can obtain
F ( X ) M 2 ϵ ϵ + M 2 ϵ = M 2 M 2 + 1 ,
the third inequality holds.
Thus, (ii) of Property 1 also holds.
Above all, the conclusion of the Theorem holds, we finish the proof. □

3. The Case Where A Is an Asymmetric Matrix and Its Convergence Analysis

In this case, we consider that
A = 1 0 0 0 λ 0 0 1 λ , b = 0 ,
where λ > 1 . Clearly, in this case, A has a double characteristic root and it is not a symmetric matrix, so the analysis of this case will be different from that in Section 2.
Firstly, we give two initial iteration points x 1 , x 2 , which satisfy
g 1 ( i ) 0 , g 2 ( i ) 0 , i = 1 , 2 , 3 .
In this case,
g k 1 = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 ,
A + A T 2 g k 1 = g k 1 ( 1 ) 2 + λ g k 1 ( 2 ) + 1 2 g k 1 ( 3 ) 2 + 1 2 g k 1 ( 2 ) + λ g k 1 ( 3 ) 2 = g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) ,
so the stepsize will be
α k n e w = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) .
And by g k = A + A T 2 x k , we have
g k + 1 = I α k n e w A + A T 2 g k .
That is to say
g k + 1 ( 1 ) g k + 1 ( 2 ) g k + 1 ( 3 ) = 1 1 1 α k n e w 1 0 0 0 λ 1 2 0 1 2 λ g k ( 1 ) g k ( 2 ) g k ( 3 ) ,
we set
M = 1 + λ g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) ;
N = g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) ,
so we can obtain
g k + 1 ( 1 ) = ( 1 + N ) g k ( 1 ) ; g k + 1 ( 2 ) = M g k ( 2 ) + 1 2 N g k ( 3 ) ; g k + 1 ( 3 ) = 1 2 N g k ( 2 ) + M g k ( 3 ) .
From (66), we have
g k + 1 2 = g k + 1 ( 1 ) 2 + g k + 1 ( 2 ) 2 + g k + 1 ( 3 ) 2 = ( 1 + N ) 2 g k ( 1 ) 2 + M g k ( 2 ) + 1 2 N g k ( 3 ) 2 + 1 2 N g k ( 2 ) + M g k ( 3 ) 2 = ( 1 + N ) 2 g k ( 1 ) 2 + M 2 + 1 4 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 + 2 M N g k ( 2 ) g k ( 3 ) .
As we know M = 1 + λ N and N < 0 , but we cannot be sure whether M is positive or negative. So in order to prove the global convergence, we consider two cases M > 0 and M < 0 .
Theorem 4. 
When M > 0 , the sequence of gradient norms { g k } converges to zero.
Proof. 
At first, we assume that g k ( 1 ) 0 , g k ( 2 ) 0 , g k ( 3 ) 0 for all k 1 . Since M = 1 + λ N > 0 , so 1 < λ < 1 N and 1 < N < 0 . Next, as for the product term in (67) we discuss it in two cases g k ( 2 ) g k ( 3 ) > 0 and g k ( 2 ) g k ( 3 ) < 0 .
Case (i). When g k ( 2 ) g k ( 3 ) > 0 , we have 2 M N g k ( 2 ) g k ( 3 ) < 0 . By (67),
g k + 1 2 < ( 1 + N ) 2 g k ( 1 ) 2 + M 2 + 1 4 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 M N g k ( 2 ) 2 + g k ( 3 ) 2 = ( 1 + N ) 2 g k ( 1 ) 2 + M 1 2 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 .
If 2 3 < N < 0 i.e., 3 2 < λ < 1 N ,
g k + 1 2   < ( 1 + N ) 2 g k 2 ,
where ( 1 + N ) 2 < 1 , so { g k } converges to zero.
And if 1 < N < 2 3 i.e., 1 < λ < 3 2 ,
g k + 1 2   < M 1 2 N 2 g k 2 = 1 + λ 1 2 N 2 g k 2 ,
where
1 + λ 1 2 N = 1 λ 1 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) < 1 λ 1 2 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + λ g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 = 1 λ 1 2 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ + 1 2 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 < 1 λ 1 2 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 λ + 1 2 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 = 1 λ 1 2 λ + 1 2 = 1 λ + 1 2 ,
so [ 1 + λ 1 2 N ] 2 < 1 , { g k } converges to zero.
Case (ii). When g k ( 2 ) g k ( 3 ) < 0 , we have 2 M N g k ( 2 ) g k ( 3 ) > 0 , and 2 g k ( 2 ) g k ( 3 ) < g k ( 2 ) 2 + g k ( 3 ) 2 , then 2 M N g k ( 2 ) g k ( 3 ) < M N g k ( 2 ) 2 + g k ( 3 ) 2 hold. By (67),
g k + 1 2 < ( 1 + N ) 2 g k ( 1 ) 2 + M 2 + 1 4 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 M N g k ( 2 ) 2 + g k ( 3 ) 2 = ( 1 + N ) 2 g k ( 1 ) 2 + M 1 2 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 .
Since (69) is the same as (68), so the proof of Case (ii) is similar to Case (i), and { g k } converges to zero, too.
According to the above analysis, we finish the proof. □
Before we prove the case of M < 0 , we set Q = g k 1 ( 1 ) 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 , k 2 and then we will give the following theorem.
Theorem 5. 
When M < 0 , if 1 < N < 2 λ + 3 2 , we assume Q < 3 , the sequence of gradient norms { g k } converges to zero; if 2 λ + 3 2 < N < 0 , the sequence { g k } also converges to zero.
Proof. 
Firstly, for any k 1 we assume that g k ( i ) 0 , i = 1 , 2 , 3 . By (67), whether g k ( 2 ) g k ( 3 ) is positive or negative, we have
g k + 1 2 < ( 1 + N ) 2 g k ( 1 ) 2 + M 2 + 1 4 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 + M N g k ( 2 ) 2 + g k ( 3 ) 2 = ( 1 + N ) 2 g k ( 1 ) 2 + M + 1 2 N 2 g k ( 2 ) 2 + g k ( 3 ) 2 .
If 1 < N < 2 λ + 3 2 , we can see that M + 1 2 N 2 = [ 1 + λ + 1 2 N ] 2 > ( 1 + N ) 2 , so g k + 1 2   < [ 1 + λ + 1 2 N ] 2 g k 2 . As we know,
1 + λ + 1 2 N = 1 λ + 1 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) = 1 λ + 1 2 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + 2 λ g k 1 ( 2 ) g k 1 ( 3 ) < 1 λ + 1 2 2 g k 1 ( 1 ) 2 + g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 g k 1 ( 1 ) 2 + λ 2 + 1 4 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 + λ g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 = 1 1 + λ + 1 2 2 1 g k 1 ( 1 ) 2 g k 1 ( 1 ) 2 + λ + 1 2 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 .
Let P = λ + 1 2 2 1 g k 1 ( 1 ) 2 g k 1 ( 1 ) 2 + λ + 1 2 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 , then we have
P < λ + 1 2 2 g k 1 ( 1 ) 2 g k 1 ( 1 ) 2 + λ + 1 2 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 < λ + 1 2 2 g k 1 ( 1 ) 2 λ + 1 2 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 = g k 1 ( 1 ) 2 g k 1 ( 2 ) 2 + g k 1 ( 3 ) 2 .
From the assumption Q < 3 , we can obtain P < 3 and [ 1 + λ + 1 2 N ] 2 < 1 , so the sequence { g k } converges to zero.
And if 2 λ + 3 2 < N < 0 , it follows that ( 1 + N ) 2 > [ 1 + λ + 1 2 N ] 2 = ( M + 1 2 N ) 2 and then g k + 1 2 < ( 1 + N ) 2 g k 2 . Since ( 1 + N ) 2 < 1 so it is obvious that { g k } converges to zero.
Above all, we finish the proof. □
From the above two theorems, we can see that when A = 1 0 0 0 λ 0 0 1 λ , λ > 1 , the new method is globally convergent. In order to prove the convergence of the new method, we add the assumption condition Q < 3 , but the value of Q does not need to be considered in the actual calculation, and it does not affect the computational efficiency of the new method.

4. Numerical Results

In this section, we present the results of some numerical experiments on how the new BB-like method using the new stepsize α k n e w compares with other BB methods in solving optimization problems. The main difference between the different methods we compare here is the choice of the stepsize. We finally choose the following stepsizes for comparison: α k B B 1 [4], α k B B 2 [4], α k D Y [30], α k M G [32] and the stepsize α k in [28]. From (12) we can see the stepsize in [28] is a convex combination, so in our experiments we set γ k = 0.5 and use it to represent this method. In addition to this, for the case when A is an n-dimensional symmetric matrix, we compare our BB-like method with the ABBmin1 and ABBmin2 methods in [18], and ABB method in [17]. The calculation results of all methods were completed by Python (v3.9.13). All the runs were carried out on a PC with an Intel Core i5, 2.3 GHz processor and 8 GB of RAM. For the examples we wanted to solve in the numerical experiments, we chose the following termination condition:
| f ( x k + 1 ) f ( x k ) | ϵ ,
for some given ϵ > 0 , so that we can obtain the expected results.
In our numerical experiments, we mainly considered five types of optimization problems. And now we give the five examples in specific forms as follows.
Example 1. 
Consider the following optimization problem,
min x R 3 f ( x ) = 1 2 x T 1 1 λ x ,
where λ > 1 , initial point x 0 = ( 10 , 7 , 1 ) T , ϵ = 10 6 .
For Example 1, we compared the number of iterations and the minimum points of the new method with the other five methods in solving optimization problems when λ changes. The specific results are shown in Table 1. Moreover, we give a comparison of the CPU time of different methods when solving Example 1 in Figure 1.
Example 2. 
Consider the following optimization problem,
min x R 3 f ( x ) = 1 2 x T a a b x ,
where 0 < a < b , initial point x 0 = ( 9 , 6 , 2 ) T , ϵ = 10 6 .
For Example 2, we give the comparison results of the number of iterations and the minimum points of each method when solving the optimization problems with different values of a and b in Table 2. And we give a comparison of the CPU time of different methods when solving Example 2 in Figure 2.
Example 3. 
Consider the following optimization problem,
min x R 3 f ( x ) = 1 2 x T 1 0 0 0 λ 0 0 1 λ x ,
where λ > 1 , ϵ = 10 8 and the initial point x 0 R 3 can be chosen at random.
For Example 3, due to the particularity of its form, A is not a symmetric matrix but other forms of BB methods require A to be a symmetric positive definite matrix. Therefore, we only give the results of the number of iterations and the minimum points of this kind of optimization problems by using the new method when the initial points change in Table 3.
Example 4. 
Consider the following optimization problem,
min x R n f ( x ) = 1 2 x T λ 1 λ 2 λ n x ,
where n = 100 , 1 λ 1 λ 2 λ n , ϵ = 10 8 and the initial point x 0 R n can be chosen at random.
For Example 4, we chose two other methods, ABBmin1 and ABBmin2 methods, to compare with our method. The parameters of ABBmin1 and ABBmin2 methods were selected as in [18], which were τ = 0.8 , m = 9 and τ = 0.9 , respectively. The initial points we chose were ( 10 , 5 , 2 ) , ( 9 , 7 , 1 ) and ( 7 , 3 , 5 ) . For each initial point, we randomly chose ten different sets of values of λ i , i = 1 , , 100 , which satisfied λ 1 = 1 , λ 100 = 10,000, and λ j was evenly distributed between 1 and 10,000 for j = 2 , , 99 . Figure 3 and Figure 4, respectively, show the results of the comparison of the number of iterations and the CPU time when the three methods solve Example 4.
Example 5 
(Random problems in [33]). Consider A = Q D Q T , where
Q = ( I 2 ω 3 ω 3 T ) ( I 2 ω 2 ω 2 T ) ( I 2 ω 1 ω 1 T ) ,
and ω 1 , ω 2 , ω 3 are unitary random vectors, D = d i a g ( σ 1 , , σ n ) is a diagonal matrix where σ 1 = 1 , σ n = c o n d , and σ j is randomly generated between 1 and condition number c o n d for j = 2 , , n 1 . We set b = 0 , ϵ = 10 8 , and the initial point x 0 = ( 1 , , 1 ) T .
For Example 5, we set n = 2000 and allowed a maximum of 10,000 iterations. In order to make a better comparison, we chose three other methods, which were ABBmin1 and ABBmin2 methods, and ABB method. The parameters used by the ABBmin1 and ABBmin2 methods were the same as Example 4. And for the ABB method, we set κ = 0.15 , which was different from that in [17]. In the experiments, three values of the condition number cond:  10 4 , 10 5 , 10 6 were chosen. For each value of c o n d , ten instances with σ j evenly distributed in [ 1 , c o n d ] were generated, j = 2 , , n 1 . The comparison of the number of iterations and the CPU time of several methods in solving Example 5 are shown in Figure 5 and Figure 6.
In all tables, ‘iter’ represents the number of iterations and ‘ x * ’ represents the optimal solution. And the vertical axis of each figure shows the percentage of problems solved by different methods within the minimum value of the metric factor ρ .
From Table 1 and Table 2, we can see that the new method has no obvious advantage in the number of iterations when solving the optimization problems like Examples 1 and 2 and the solution accuracy can reach the level of other methods. However, in terms of CPU time, we can see from Figure 1 and Figure 2 that the new method has a clear advantage over other compared methods. When solving optimization problems like Example 3, it is not difficult to see from Table 3 that the new method can complete well in terms of the number of iterations and the accuracy of the minimum points. And from Figure 3 and Figure 4, we can see that there is no significant difference in the number of iterations between the three methods when solving the problems such as Example 4, but the new method has a slight advantage in terms of the CPU time. For the random problems like Example 5, we can see from Figure 5 that the new method and ABBmin2 method perform better in terms of the number of iterations, while in terms of the CPU time, we can see from Figure 6 that the new method still has obvious advantages.

5. Conclusions

In this paper, we proposed a modified BB-like method which used the stepsize α k n e w and analyzed two cases when the the coefficient matrix A of the quadratic term of quadratic function is a three-order matrix. For the case, A = 1 1 λ , λ > 1 , we have proved the R-superlinear convergence of this case and generalized this case to A = μ μ φ , φ > μ 1 . In addition to that, we have further generalized this case to the n-dimensional form, that is, A = diag { λ 1 , λ 2 , , λ n } , where 1 = λ 1 λ 2 λ n . And we have proved the R-linear convergence of the n-dimensional case. The numerical experimental results have shown that this method has significant advantage in running time when comparing with some other methods. For another case A = 1 0 0 0 λ 0 0 1 λ , λ > 1 , we also proved the global convergence of this case under some assumption and by the numerical results we can see that this modified method is fast and effective in dealing with problems. To sum up, using the modified stepsize α k n e w to solve three-dimensional optimization problems is well-behaved.

Author Contributions

Methodology, Q.H.; software, T.W.; supervision, Q.H.; writing—original draft, T.W.; writing—review and editing, T.W. and Q.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundation of China grant 12171196.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

There is no conflicts of interests.

References

  1. Cauchy, A. Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris 1847, 25, 536–538. [Google Scholar]
  2. Forsythe, G.E. On the asymptotic directions of the s-dimensional optimum gradient method. Numer. Math. 1968, 11, 57–76. [Google Scholar] [CrossRef]
  3. Nocedal, J.; Sartenaer, A.; Zhu, C. On the behavior of the gradient norm in the steepest descent method. Comput. Optim. Appl. 2002, 22, 5–35. [Google Scholar] [CrossRef]
  4. Barzilai, J.; Borwein, J.M. Two-point step size gradient methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
  5. Dennis, J.E., Jr.; Moré, J.J. Quasi-Newton methods, motivation and theory. SIAM Rev. 1977, 19, 46–89. [Google Scholar] [CrossRef]
  6. Birgin, E.G.; Martínez, J.M.; Raydan, M. Spectral projected gradient methods: Review and perspectives. J. Stat. Softw. 2014, 60, 1–21. [Google Scholar] [CrossRef]
  7. Fletcher, R. On the Barzilai-Borwein method. In Optimization and Control with Applications; Qi, L.Q., Teo, K., Yang, X.Q., Eds.; Springer: New York, NY, USA, 2005; pp. 235–256. [Google Scholar]
  8. Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Spectral properties of Barzilai-Borwein rules in solving singly linearly constrained optimization problems subject to lower and upper bounds. SIAM J. Optim. 2020, 30, 1300–1326. [Google Scholar] [CrossRef]
  9. Huang, Y.K.; Dai, Y.H.; Liu, X.W. Equipping the Barzilai-Borwein method with the two dimensional quadratic termination property. SIAM J. Optim. 2021, 31, 3068–3096. [Google Scholar] [CrossRef]
  10. Huang, Y.; Liu, H. On the rate of convergence of projected Barzilai-Borwein methods. Optim. Methods Softw. 2015, 30, 880–892. [Google Scholar] [CrossRef]
  11. Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai-Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
  12. Raydan, M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 1997, 7, 26–33. [Google Scholar] [CrossRef]
  13. Grippo, L.; Lampariello, F.; Lucidi, S. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 1986, 23, 707–716. [Google Scholar] [CrossRef]
  14. Dai, Y.H.; Fletcher, R. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 2005, 100, 21–47. [Google Scholar] [CrossRef]
  15. Huang, Y.; Liu, H. Smoothing projected Barzilai-Borwein method for constrained non-Lipschitz optimization. Comput. Optim. Appl. 2016, 65, 671–698. [Google Scholar] [CrossRef]
  16. Dai, Y.H. Alternate step gradient method. Optimization 2003, 52, 395–415. [Google Scholar] [CrossRef]
  17. Zhou, B.; Gao, L.; Dai, Y.H. Gradient methods with adaptive step-sizes. Comput. Optim. Appl. 2006, 35, 69–86. [Google Scholar] [CrossRef]
  18. Frassoldati, G.; Zanni, L.; Zanghirati, G. New adaptive stepsize selections in gradient methods. J. Ind. Manag. Optim. 2008, 4, 299–312. [Google Scholar] [CrossRef]
  19. De Asmundis, R.; Di Serafino, D.; Hager, W.W.; Toraldo, G.; Zhang, H. An efficient gradient method using the Yuan steplength. Comput. Optim. Appl. 2014, 59, 541–563. [Google Scholar] [CrossRef]
  20. Dai, Y.H.; Yuan, Y.X. Analysis of monotone gradient methods. J. Ind. Manag. Optim. 2005, 1, 181–192. [Google Scholar] [CrossRef]
  21. Broyden, C.G. A class of methods for solving nonlinear simultaneous equations. Math. Comput. 1965, 19, 577–593. [Google Scholar] [CrossRef]
  22. Davidon, W.C. Variable metric method for minimization. SIAM J. Optim. 1991, 1, 1–17. [Google Scholar] [CrossRef]
  23. Fletcher, R.; Powell, M.J.D. A rapidly convergent descent method for minimization. Comput. J. 1963, 6, 163–168. [Google Scholar] [CrossRef]
  24. Broyden, C.G. The convergence of single-rank quasi-Newton methods. Math. Comput. 1970, 24, 365–382. [Google Scholar] [CrossRef]
  25. Fletcher, R. A new approach to variable metric algorithms. Comput. J. 1970, 13, 317–322. [Google Scholar] [CrossRef]
  26. Goldfrab, D. A family of variable-metric methods derived by variational means. Math. Comput. 1970, 24, 23–26. [Google Scholar] [CrossRef]
  27. Shanno, D.F. Conditioning if quasi-Newton methods for function minimization. Math. Comput. 1970, 24, 647–656. [Google Scholar] [CrossRef]
  28. Dai, Y.H.; Huang, Y.; Liu, X.W. A family of spectral gradient methods for optimization. Comput. Optim. Appl. 2019, 74, 43–65. [Google Scholar] [CrossRef]
  29. Dai, Y.H.; Al-Baali, M.; Yang, X. A positive Barzilai-Borwein-like stepsize and an extension for symmetric linear systems. In Numerical Analysis and Optimization; Al-Baali, M., Grandientti, L., Purnama, A., Eds.; Springer: Cham, Switzerland, 2015; pp. 59–75. [Google Scholar]
  30. Dai, Y.H.; Yang, X.Q. A new gradient method with an optimal stepsize property. Comput. Optim. Appl. 2006, 33, 73–88. [Google Scholar] [CrossRef]
  31. Elman, H.C.; Golub, G.H. Inexact and preconditioned Uzawa algorithm for saddle point problems. SIAM J. Numer. Anal. 1994, 31, 1645–1661. [Google Scholar] [CrossRef]
  32. Dai, Y.H.; Yuan, Y.X. Alternate minimization gradient methods. IMA J. Numer. Anal. 2003, 23, 377–393. [Google Scholar] [CrossRef]
  33. Friedlander, A.; Martínez, J.M.; Molina, B.; Raydan, M. Gradient method with retards and generalizations. SIAM J. Numer. Anal. 1999, 36, 275–289. [Google Scholar] [CrossRef]
Figure 1. Comparison of six methods on CPU time for Example 1.
Figure 1. Comparison of six methods on CPU time for Example 1.
Mathematics 13 00215 g001
Figure 2. Comparison of six methods on CPU time for Example 2.
Figure 2. Comparison of six methods on CPU time for Example 2.
Mathematics 13 00215 g002
Figure 3. Comparison of three methods on number of iterations for Example 4.
Figure 3. Comparison of three methods on number of iterations for Example 4.
Mathematics 13 00215 g003
Figure 4. Comparison of three methods on CPU time for Example 4.
Figure 4. Comparison of three methods on CPU time for Example 4.
Mathematics 13 00215 g004
Figure 5. Comparison of four methods on number of iterations for Example 5.
Figure 5. Comparison of four methods on number of iterations for Example 5.
Mathematics 13 00215 g005
Figure 6. Comparison of four methods on CPU time for Example 5.
Figure 6. Comparison of four methods on CPU time for Example 5.
Mathematics 13 00215 g006
Table 1. Number of iterations and minimum points of compared methods for Example 1.
Table 1. Number of iterations and minimum points of compared methods for Example 1.
λ α k n e w α k B B 1 α k B B 2
iter ( x * )iter ( x * )iter ( x * )
37 ( 1.86 e 8 , 1.30 e 8 , 8.66 e 7 ) T 7 ( 4.79 e 8 , 3.52 e 8 , 1.31 e 6 ) T 7 ( 1.73 e 9 , 1.21 e 9 , 4.27 e 7 ) T
59 ( 1.48 e 9 , 1.04 e 9 , 2.57 e 6 ) T 9 ( 4.06 e 9 , 2.84 e 9 , 1.82 e 5 ) T 9 ( 5.91 e 12 , 4.14 e 12 , 1.06 e 9 ) T
1012 ( 1.21 e 5 , 8.47 e 6 , 8.80 e 6 ) T 14 ( 5.46 e 11 , 3.82 e 11 , 0.0003 ) T 9 ( 0.0002 , 0.0001 , 1.66 e 6 ) T
5010 ( 8.43 e 8 , 5.90 e 8 , 2.54 e 6 ) T 9 ( 0.0005 , 0.0004 , 2.34 e 12 ) T 10 ( 9.47 e 9 , 6.63 e 9 , 1.44 e 11 ) T
10010 ( 2.70 e 15 , 1.89 e 15 , 1.41 e 10 ) T 7 ( 0.0011 , 0.0008 , 2.47 e 8 ) T 8 ( 0.0044 , 0.0031 , 1.57 e 14 ) T
5007 ( 1.12 e 5 , 7.84 e 6 , 1.18 e 8 ) T 7 ( 5.51 e 8 , 3.86 e 8 , 3.55 e 15 ) T 7 ( 3.09 e 7 , 2.17 e 7 , 2.75 e 9 ) T
10007 ( 1.90 e 7 , 1.33 e 7 , 9.46 e 11 ) T 7 ( 5.53 e 10 , 3.87 e 10 , 3.55 e 15 ) T 7 ( 4.87 e 9 , 3.41 e 9 , 1.08 e 11 ) T
10,0007 ( 1.98 e 13 , 1.39 e 13 , 2.78 e 17 ) T 7 ( 7.09 e 17 , 4.97 e 17 , 0 ) T 7 ( 4.89 e 15 , 3.43 e 15 , 1.73 e 18 ) T
λ α k DY γk = 0.5 α k MG
iter ( x * )iter ( x * )iter ( x * )
37 ( 1.90 e 8 , 1.30 e 8 , 8.67 e 7 ) T 7 ( 1.87 e 8 , 1.31 e 8 , 8.69 e 7 ) T 11 ( 0.0007 , 0.0005 , 7.01 e 5 ) T
59 ( 1.47 e 9 , 1.00 e 9 , 2.50 e 6 ) T 9 ( 1.61 e 9 , 1.13 e 9 , 2.68 e 6 ) T 23 ( 0.0013 , 0.0009 , 0.0001 ) T
1012 ( 1.22 e 5 , 8.87 e 6 , 8.80 e 6 ) T 12 ( 1.16 e 5 , 8.15 e 6 , 1.38 e 5 ) T 25 ( 0.0014 , 0.0010 , 0.0001 ) T
5010 ( 8.42 e 8 , 6.00 e 8 , 2.50 e 6 ) T 10 ( 4.91 e 10 , 3.44 e 10 , 1.29 e 6 ) T 7 ( 0.0016 , 0.0011 , 0.0002 ) T
10010 ( 2.59 e 15 , 1.91 e 15 , 1.43 e 10 ) T 10 ( 6.94 e 18 , 6.90 e 18 , 2.13 e 10 ) T 7 ( 2.98 e 5 , 2.09 e 5 , 2.98 e 6 ) T
5007 ( 1.20 e 5 , 7.88 e 6 , 1.20 e 8 ) T 7 ( 6.83 e 5 , 4.78 e 5 , 6.45 e 10 ) T 5 ( 3.52 e 6 , 2.46 e 6 , 3.52 e 7 ) T
10007 ( 1.89 e 7 , 1.50 e 7 , 9.50 e 11 ) T 7 ( 3.10 e 6 , 2.17 e 6 , 4.37 e 12 ) T 5 ( 2.21 e 7 , 1.55 e 7 , 2.21 e 8 ) T
10,0007 ( 2.01 e 13 , 1.40 e 13 , 2.77 e 17 ) T 7 ( 1.29 e 11 , 9.05 e 12 , 0 ) T 5 ( 1.49 e 5 , 1.04 e 5 , 1.49 e 6 ) T
Table 2. Number of iterations and minimum points of compared methods for Example 2.
Table 2. Number of iterations and minimum points of compared methods for Example 2.
a , b α k n e w α k B B 1 α k B B 2
iter ( x * )iter ( x * )iter ( x * )
a = 2 , b = 5 7 ( 1.93 e 5 , 1.28 e 5 , 0.0002 ) T 7 ( 4.79 e 5 , 3.19 e 5 , 0.0002 ) T 7 ( 2.89 e 6 , 1.92 e 6 , 8.78 e 5 ) T
a = 10 , b = 16 7 ( 1.57 e 9 , 1.04 e 9 , 6.55 e 8 ) T 7 ( 2.79 e 9 , 1.86 e 9 , 8.09 e 8 ) T 7 ( 6.73 e 10 , 4.49 e 10 , 5.00 e 8 ) T
a = 25 , b = 30 5 ( 2.08 e 5 , 1.39 e 5 , 9.42 e 9 ) T 5 ( 2.08 e 5 , 1.38 e 5 , 1.12 e 8 ) T 5 ( 2.09 e 5 , 1.39 e 5 , 7.81 e 9 ) T
a = 50 , b = 120 7 ( 9.10 e 6 , 6.07 e 6 , 8.78 e 5 ) T 7 ( 2.21 e 5 , 1.47 e 5 , 0.0001 ) T 7 ( 1.51 e 6 , 1.01 e 6 , 5.00 e 5 ) T
a = 100 , b = 350 10 ( 1.18 e 5 , 7.84 e 6 , 1.13 e 11 ) T 10 ( 3.99 e 5 , 2.67 e 5 , 6.16 e 10 ) T 9 ( 3.46 e 7 , 2.31 e 7 , 1.16 e 6 ) T
a = 1000 , b = 5000 14 ( 3.98 e 10 , 2.65 e 10 , 1.90 e 7 ) T 15 ( 2.68 e 5 , 1.79 e 5 , 4.35 e 9 ) T 12 ( 3.76 e 17 , 2.51 e 17 , 4.77 e 10 ) T
a = 10,000, b = 15,0007 ( 2.89 e 10 , 1.93 e 10 , 1.51 e 8 ) T 7 ( 4.83 e 10 , 3.22 e 10 , 1.81 e 8 ) T 7 ( 1.42 e 10 , 9.46 e 11 , 1.20 e 8 ) T
a , b α k D Y γ k = 0.5 α k M G
iter ( x * )iter ( x * )iter ( x * )
a = 2 , b = 5 7 ( 1.92 e 5 , 1.20 e 5 , 0.0001 ) T 7 ( 1.95 e 5 , 1.30 e 5 , 0.0002 ) T 12 ( 0.0004 , 0.0003 , 0.0002 ) T
a = 10 , b = 16 7 ( 1.70 e 9 , 1.11 e 9 , 6.55 e 8 ) T 7 ( 1.57 e 9 , 1.05 e 9 , 6.55 e 8 ) T 7 ( 0.0001 , 7.96 e 5 , 2.65 e 5 ) T
a = 25 , b = 30 5 ( 2.08 e 5 , 1.30 e 5 , 9.50 e 5 ) T 5 ( 2.08 e 5 , 1.38 e 5 , 9.42 e 9 ) T 5 ( 2.76 e 5 , 1.84 e 5 , 6.14 e 6 ) T
a = 50 , b = 120 7 ( 9.09 e 6 , 6.08 e 6 , 8.80 e 5 ) T 7 ( 9.21 e 6 , 6.14 e 6 , 8.82 e 5 ) T 13 ( 0.0001 , 7.14 e 5 , 2.38 e 5 ) T
a = 100 , b = 350 10 ( 1.11 e 5 , 7.88 e 6 , 1.15 e 11 ) T 10 ( 1.30 e 5 , 8.68 e 6 , 1.43 e 11 ) T 20 ( 0.0001 , 7.26 e 5 , 1.65 e 5 ) T
a = 1000 , b = 5000 14 ( 4.00 e 10 , 2.65 e 10 , 1.90 e 7 ) T 14 ( 1.04 e 9 , 6.93 e 10 , 2.70 e 7 ) T 23 ( 4.49 e 5 , 3.00 e 5 , 9.99 e 6 ) T
a = 10,000, b = 15,0007 ( 2.80 e 10 , 1.93 e 10 , 1.50 e 8 ) T 7 ( 2.90 e 10 , 1.93 e 10 , 1.51 e 8 ) T 8 ( 1.56 e 6 , 1.04 e 6 , 3.01 e 6 ) T
Table 3. Number of iterations and minimum points of our method for Example 3.
Table 3. Number of iterations and minimum points of our method for Example 3.
λ ( 2 , 2 , 1 ) T ( 1 , 2 , 0 ) T ( 1 , 2 , 1 ) T
iter ( x * )iter ( x * )iter ( x * )
311 ( 0.0001 , 5.56 e 6 , 4.09 e 6 ) T 10 ( 2.69 e 6 , 6.88 e 7 , 6.57 e 7 ) T 11 ( 8.69 e 7 , 3.00 e 6 , 1.29 e 6 ) T
514 ( 2.46 e 7 , 1.64 e 6 , 2.08 e 6 ) T 14 ( 5.38 e 7 , 1.47 e 6 , 1.62 e 6 ) T 10 ( 0.0002 , 1.73 e 5 , 1.54 e 5 ) T
1011 ( 5.39 e 6 , 2.85 e 6 , 2.91 e 6 ) T 14 ( 3.01 e 7 , 9.89 e 6 , 4.34 e 6 ) T 12 ( 7.13 e 9 , 2.04 e 5 , 2.02 e 5 ) T
5013 ( 1.46 e 5 , 4.94 e 8 , 5.23 e 8 ) T 14 ( 2.88 e 7 , 1.56 e 6 , 2.01 e 6 ) T 13 ( 2.20 e 6 , 1.72 e 8 , 1.64 e 8 ) T
10013 ( 4.42 e 6 , 2.84 e 8 , 2.33 e 8 ) T 14 ( 5.92 e 7 , 1.57 e 6 , 4.91 e 6 ) T 13 ( 3.87 e 6 , 8.53 e 9 , 3.62 e 9 ) T
50013 ( 3.44 e 7 , 4.65 e 9 , 4.60 e 9 ) T 9 ( 0.0022 , 2.75 e 8 , 6.26 e 10 ) T 13 ( 5.49 e 6 , 9.12 e 9 , 7.72 e 9 ) T
100013 ( 1.86 e 7 , 1.53 e 9 , 1.52 e 9 ) T 9 ( 0.0022 , 6.86 e 9 , 7.83 e 11 ) T 13 ( 5.72 e 6 , 1.00 e 8 , 9.38 e 9 ) T
10,00013 ( 8.86 e 8 , 9.85 e 14 , 3.86 e 12 ) T 8 ( 0.0022 , 1.17 e 10 , 1.37 e 6 ) T 13 ( 5.93 e 6 , 1.10 e 8 , 1.10 e 8 ) T
λ ( 3 , 2 , 1 ) T ( 9 , 7 , 0 ) T
iter ( x * )iter ( x * )
311 ( 2.57 e 5 , 5.13 e 6 , 7.03 e 6 ) T 12 ( 3.33 e 9 , 3.17 e 5 , 1.54 e 5 ) T
514 ( 2.44 e 6 , 5.61 e 6 , 6.03 e 6 ) T 13 ( 1.06 e 5 , 2.10 e 5 , 2.19 e 5 ) T
1014 ( 2.98 e 7 , 2.91 e 6 , 3.93 e 6 ) T 14 ( 1.76 e 6 , 2.42 e 5 , 1.14 e 5 ) T
5013 ( 2.40 e 8 , 2.32 e 8 , 4.86 e 8 ) T 14 ( 2.59 e 5 , 7.07 e 6 , 1.86 e 6 ) T
10013 ( 1.77 e 6 , 1.35 e 8 , 4.00 e 8 ) T 14 ( 1.72 e 5 , 2.84 e 7 , 1.33 e 7 ) T
50013 ( 6.29 e 7 , 7.88 e 9 , 7.76 e 9 ) T 13 ( 1.33 e 7 , 3.19 e 7 , 3.65 e 7 ) T
100013 ( 1.59 e 7 , 4.25 e 9 , 4.23 e 9 ) T 13 ( 2.08 e 6 , 6.67 e 7 , 8.73 e 7 ) T
10,0008 ( 0.0093 , 1.90 e 9 , 2.01 e 9 ) T 8 ( 0.0089 , 6.99 e 9 , 2.14 e 7 ) T
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, T.; Huang, Q. Research on Three-Dimensional Extension of Barzilai-Borwein-like Method. Mathematics 2025, 13, 215. https://doi.org/10.3390/math13020215

AMA Style

Wang T, Huang Q. Research on Three-Dimensional Extension of Barzilai-Borwein-like Method. Mathematics. 2025; 13(2):215. https://doi.org/10.3390/math13020215

Chicago/Turabian Style

Wang, Tianji, and Qingdao Huang. 2025. "Research on Three-Dimensional Extension of Barzilai-Borwein-like Method" Mathematics 13, no. 2: 215. https://doi.org/10.3390/math13020215

APA Style

Wang, T., & Huang, Q. (2025). Research on Three-Dimensional Extension of Barzilai-Borwein-like Method. Mathematics, 13(2), 215. https://doi.org/10.3390/math13020215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop