You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

23 April 2018

A New Descent Algorithm Using the Three-Step Discretization Method for Solving Unconstrained Optimization Problems

and
Department of Applied Mathematics, Faculty of Mathematics, Yazd University, P. O. Box 89195-741, Yazd, Iran
*
Author to whom correspondence should be addressed.

Abstract

In this paper, three-step Taylor expansion, which is equivalent to third-order Taylor expansion, is used as a mathematical base of the new descent method. At each iteration of this method, three steps are performed. Each step has a similar structure to the steepest descent method, except that the generalized search direction, step length, and next iterative point are applied. Compared with the steepest descent method, it is shown that the proposed algorithm has higher convergence speed and lower computational cost and storage.

1. Introduction

We start our discussion by considering the unconstrained minimization problem:
Find :   x * = [ x 1 * , x 2 * , , x n * ] T   that   minimizes   f ( x 1 , x 2 , , x n ) f ( x ) .
where f is an n-dimensional continuously differentiable function.
Most effective optimization procedures include some basic steps. At iteration k, where the current x is x k , they do the following:
1.
Specify the initial starting vector x 0 = [ x 1 0 , x 2 0 , , x n 0 ] T ,
2.
Find an appropriate search direction d k ,
3.
Specify the convergence criteria for termination,
4.
Minimize along the direction d k to find a new point x k + 1 from the following equation
x k + 1 = x k + α k d k .
where α k is a positive scalar called the step size. The step size is usually determined by an optimization process called a line search (usually inexactly) such as Wolfe-line search, Goldstein-line search or Armijo-line search
f ( x k + α d k ) < f ( x k ) + α c f ( x k ) . d k .
where c ( 0 , 1 ) is a constant control parameter. For more information on the line search strategy, readers can refer to [1,2,3,4,5,6,7]. In addition, some researchers have introduced suitable algorithms which defined α k without a line search method that can be seen in the literature [8,9,10]. Appropriate choices for an initial starting point have a positive effect on computational cost and speed of convergence.
The numerical optimization of general multivariable objective functions differs mainly in how they generate the search directions. A good search direction should reduce the objective function’s value so that
f ( x k + 1 ) < f ( x k ) .
Such a direction d k is a descent direction and satisfies the following requirement at any point
f ( x k ) . d k < 0
where the dot indicates the inner product of two vectors f ( x k ) and d k .
The choice of search direction d k is typically based on some approximate model of the objective function f which is obtained by the Taylor series [11]
f ( x + Δ x ) = f ( x ) + f ( x ) . Δ x + 1 2 ! [ f . Δ x ] . Δ x + 1 3 ! [ [ f . Δ x ] . Δ x ] . Δ x + .
where x and Δ x are replaced with x k and α d k , respectively.
For instance, a linear approximation (first order Taylor expansion) of the objective function
f ( x k + α d k ) f ( x k ) + α f ( x k ) . d k
can be used which concludes the following direction
d k = f ( x k ) .
This is referred to as the steepest descent method [4]. The overall results on the convergence of the steepest descent method can be found in [4,12,13]. More practical applications of the steepest descent method are discussed in [10,14,15,16].
Unlike gradient methods which use first order Taylor expansion, Newton method uses a second order Taylor expansion of the function about the current design point; i.e., a quadratic model
f ( x k + d k ) f ( x k ) + f ( x k ) . d k + 1 2 [ f ( x k ) . d k ] . d k .
which yields Newton direction
d k = ( 2 f ( x k ) 1 f ( x k ) .
Although the Newton method is below the quadratic rate of convergence, it requires the computation and storage of some matrices associated with the Hessian of the objective function. Moreover, the Newton method can only be utilized if the Hessian matrix is positive definite [4].
The purpose of this paper is to provide a descent algorithm which can exploit more information about the objective function without the need to store large matrices such as the Hessian matrix. In order to construct a desired model of the objective function, we propose a three-step discretization method based on three-step Taylor expansion that is used in [17,18,19] as follows
f ( x + Δ x 3 ) f ( x ) + 1 3 f ( x ) . Δ x f ( x + Δ x 2 ) f ( x ) + 1 2 f ( x + Δ x 3 ) . Δ x f ( x + Δ x ) f ( x ) + f ( x + Δ x 2 ) . Δ x .
Formula (5) is equivalent to Equation (4) truncated at three terms, i.e.,
f ( x + Δ x ) = f ( x ) + f ( x ) . Δ x + 1 2 ! [ f . Δ x ] . Δ x + 1 3 ! [ [ f . Δ x ] . Δ x ] . Δ x + O [ ( Δ x ) 4 ] .
Hence, using these steps of Formula (5) in the approximation of a function around x gives a third-order accuracy with O [ ( Δ x ) 4 ] .
On the other hand, we can say that these steps of Formula (5) are derived by applying a factorization process to the right side of Equation (6) as follows
f ( x ) [ I + . Δ x I + 1 2 . Δ x [ I + 1 3 . Δ x ] ] = f ( x ) + . Δ x f ( x ) + 1 2 . Δ x [ f ( x ) + 1 3 f ( x ) . Δ x ] + O [ ( Δ x ) 4 ] .
where the symbol I is the identity operator. Now, by removing the terms appearing in O [ ( Δ x ) 4 ] and using the Taylor series properties, the first internal bracket in Equation (7) shows f ( x + Δ x 3 ) , the second internal bracket shows f ( x + Δ x 2 ) and the last one shows f ( x + Δ x ) .
Jiang and Kawahara [17] are the pioneers who used this formula to solve unsteady incompressible flows governed by the Navier-Stokes equations. In their method, the discretization in time is performed before the spatial approximation by means of Formula (5) with respect to the time. In comparison with the Equation (6), Formula (5) does not contain any new higher-order derivatives. Moreover, it reduces the demand of smoothness of the function, gives the superior approximation of the function and presents some stability advantages in multidimensional problems [17]. The Formula (5) is useful in solving non-linear partial differential equations, hyperbolic problems, multi-dimensional and coupled equations.
In this article, we utilize Formula (5) to obtain a descent direction that satisfies the inequality of Equation (3) and then modify the Armijo-rule in the line search method to achieve an appropriate step size. Finally, Equation (1) is utilized to obtain a sequence that reduces the value of the function. Since all equations of the three-step discretization process involve only the first order derivative of the function, the proposed method is a member of gradient methods such as the steepest descent method. However, numerical results demonstrate that the current method works better than the steepest descent method.
The organization of the paper is as follows. In Section 2, there is a brief review of the gradient-type algorithms and their applications. In Section 3, the three-step discretization algorithm and its fundamental properties are described. In Section 4, we show how the proposed algorithm converges globally. Some noteworthy numerical examples are presented in Section 5. Finally, Section 6 provides conclusions of the study.

3. New Descent Algorithm

In this section, we show how equations in Formula (5) can be used to propose the three-step discretization method and give its pseudo-code. The following regularity of f ( x ) is assumed in the following:
Assumption 1.
We assume f C 1 ( R n ) with Lipschitz continuous gradient, i.e., there exists L > 0 such that
f ( x ) f ( y ) L x y , x , y R n .
The substitution α d for Δ x in Formula (5) yields:
f ( x + α d 3 ) f ( x ) + α 3 f ( x ) . d f ( x + α d 2 ) f ( x ) + α 2 f ( x + α d 3 ) . d f ( x + α d ) f ( x ) + α f ( x + α d 2 ) . d .

3.1. Three-Step Discretization Algorithm

Three steps are performed throughout this algorithm. Actually, in the steepest descent method, there is not an intermediate step between the computation of x k and x k + 1 . However, in the proposed method, we impose an intermediate step between x k and x k + 1 . In other words, x k is computed from the first step and x k + 1 is computed from the third step. The second step is an intermediate step that uses x k as the starting point and produces the new point such as x * . The x * forms the starting point for the last step which results in x k + 1 . The structure of the proposed algorithm is characterized as follows.
The main goal of this algorithm is that the value of the objective function declines during all three steps. If the same direction is applied to all three steps, it will be a non-descent direction in at least one of the steps. Therefore, we need to consider each of the steps separately. Moreover, if the point x k is employed in the first step of Formula (8), the point x k + 1 will be achieved by the third step of Formula (8).
To determine which step of Formula (8) is performing, we use the super-index j. At each upgrading index k, the index j adopts only values ( 0 ) , ( 1 ) and ( 2 ) , respectively. During the kth iteration of the main algorithm, directions d k ( 0 ) , d k ( 1 ) , and d k ( 2 ) are used in the first, second and third step, respectively. The points x k ( 0 ) , x k ( 1 ) , and x k ( 2 ) are used as starting points in the first, second and third step, respectively. The proper steps size α k ( 0 ) , α k ( 1 ) , and α k ( 2 ) are obtained from the first, second and last step, respectively. Details are explained as follows.
The first step of Formula (8) can be rewritten in the following form
f ( x k ( 0 ) + α d k ( 0 ) 3 ) f ( x k ( 0 ) ) + α 3 f ( x k ( 0 ) ) . d k ( 0 ) .
The direction d k ( 0 ) should be chosen in such a way that f ( x k ( 0 ) ) . d k ( 0 ) provides the greatest reduction in f ( x k ( 0 ) ) so that
f ( x k ( 0 ) + α d k ( 0 ) 3 ) < f ( x k ( 0 ) ) .
Hence, the direction of the first step is calculated similarly to the process of finding steepest descent direction [4]. So, in the first step of Formula (8), we choose
d k ( 0 ) = f ( x k ( 0 ) ) .
Next, we must determine a proper step size for the first step. In this article, to find the step size, we use the backtracking algorithm with an Armijo-line search. Considering the first step of Formula (8), the general Armijo condition of Equation (2) can be modified as follows
f ( x k ( 0 ) + α 3 d k ( 0 ) ) < f ( x k ( 0 ) ) + α 3 c f ( x k ( 0 ) ) . d k ( 0 ) .
Therefore, the implementation of the backtracking algorithm with Equation (9) provides the proper step size, α k ( 0 ) , and the following point
x k ( 1 ) = x k ( 0 ) + α k ( 0 ) d k ( 0 ) 3 .
We use x k ( 1 ) as a starting point for the second step of Formula (8).
We can rewrite the second step as follows:
f ( x k ( 1 ) + α d k ( 1 ) 2 ) f ( x k ( 1 ) ) + α 2 f ( x k ( 0 ) + α k ( 0 ) d k ( 0 ) 3 ) . d k ( 1 ) .
Now the direction d k ( 1 ) should satisfy the descent requirement of Equation (3) at the point, x k ( 0 ) + α k ( 0 ) d k ( 0 ) 3 , i.e.,
f ( x k ( 0 ) + α k ( 0 ) d k ( 0 ) 3 ) . d k ( 1 ) < 0 .
By using a similar analysis for finding the steepest descent direction and using Equation (10) we have
d k ( 1 ) = f ( x k ( 0 ) + α k ( 0 ) d k ( 0 ) 3 ) = f ( x k ( 1 ) ) .
After determining the direction d k ( 1 ) , we should find a proper step size. In the second step, we modify general Armijo condition of Equation (2) as follows
f ( x k ( 1 ) + α 2 d k ( 1 ) ) < f ( x k ( 1 ) ) + α 2 c f ( x k ( 1 ) ) . d k ( 1 ) .
We use Equation (12) in the backtracking algorithm to find the proper step size, α k ( 1 ) , and the following point
x k ( 2 ) = x k ( 1 ) + α k ( 1 ) d k ( 1 ) 2 .
We use the point x k ( 2 ) as a starting point in the last step of Formula (8).
Now, we have the following result in the third step
f ( x k ( 2 ) + α d k ( 2 ) ) f ( x k ( 2 ) ) + α f ( x k ( 1 ) + α k ( 1 ) d k ( 1 ) 2 ) . d k ( 2 ) .
where the Equation (11) gives d k ( 1 ) . The direction d k ( 2 ) is obtained from
d k ( 2 ) = f ( x k ( 1 ) + α k ( 1 ) d k ( 1 ) 2 ) = f ( x k ( 2 ) ) .
Also, according to the last step of Formula (8), the last step of the presented algorithm uses the general Armijo condition of Equation (2). With replacement d k ( 2 ) and x k ( 2 ) we have
f ( x k ( 2 ) + α d k ( 2 ) ) < f ( x k ( 2 ) ) + α c f ( x k ( 2 ) ) . d k ( 2 ) .
Utilization of the Equation (14) in the backtracking algorithm gives the proper step size, α k ( 2 ) , and the following point
x k + 1 = x k ( 2 ) + α k ( 2 ) d k ( 2 ) .
After obtaining x k + 1 , we go back to the first step of Formula (8). In fact, x k + 1 forms a starting point in the first step of Formula (8), i.e.,
x k + 1 = x k + 1 ( 0 ) .
This process will continue until the stop condition is attained.
The ensuing result is achieved by considering the fact that each step in the presented method deals with a descent direction.
f ( x k ( 1 ) ) < f ( x k ( 0 ) ) = f ( x k ) , f ( x k ( 2 ) ) < f ( x k ( 1 ) ) , f ( x k + 1 ) = f ( x k + 1 ( 0 ) ) < f ( x k ( 2 ) ) .
Therefore, implementation of this method concludes that f ( x k + 1 ) < f ( x k ) .
The pseudo-code of the three-step discretization method with backtracking Armijo line-search is provided in the Algorithm 1.
Algorithm 1 Pseudo-code of the three-step discretization method
Mathematics 06 00063 i001

3.2. Theoretical Analysis of the Three-Step Discretization Algorithm

As mentioned above, the steepest descent method uses only the gradient of the function and the Newton method uses the second order derivatives of the objective function. In this section, we explain about the objective function information included in the proposed algorithm.
Let
ξ 1 = α k ( 0 ) d k ( 0 ) 3 , ξ 2 = α k ( 1 ) d k ( 1 ) 2 , ξ 3 = α k ( 2 ) d k ( 2 ) .
After implementation of the three-step discretization algorithm at x k = x k ( 0 ) and using Equations (10) and (13), the following form of Formula (8) will be obtained
f ( x k + ξ 1 ) f ( x k ) + f ( x k ) . ξ 1 , f ( ( x k + ξ 1 ) + ξ 2 ) f ( x k + ξ 1 ) + f ( x k + ξ 1 ) . ξ 2 , f ( ( x k + ξ 1 + ξ 2 ) + ξ 3 ) f ( x k + ξ 1 + ξ 2 ) + f ( x k + ξ 1 + ξ 1 ) . ξ 3 .
Now consider the third order Taylor series expansion of f ( x k + ξ 1 + ξ 2 + ξ 3 ) at x k , i.e.,
f ( x k + ξ 1 + ξ 2 + ξ 3 ) f ( x k ) + f ( x k ) . ( ξ 1 + ξ 2 + ξ 3 ) + 1 2 ! [ f ( x k ) . ( ξ 1 + ξ 2 + ξ 3 ) ] . ( ξ 1 + ξ 2 + ξ 3 ) + 1 3 ! [ [ f ( x k ) . ( ξ 1 + ξ 2 + ξ 3 ) ] . ( ξ 1 + ξ 2 + ξ 3 ) ] . ( ξ 1 + ξ 2 + ξ 3 ) .
According to the first equation of Formula (17), we have
f ( x k + ξ 1 ) f ( x k ) + [ f ( x k ) . ξ 1 ] .
By considering the first and second equation of Formula (17) and Equation (19), we have
f ( ( x k + ξ 1 ) + ξ 2 ) f ( x k + ξ 1 ) + f ( x k + ξ 1 ) . ξ 2 ( f ( x k ) + f ( x k ) . ξ 1 ) + ( f ( x k ) . ξ 2 + [ f ( x k ) . ξ 1 ] . ξ 2 ) .
Now if we take the gradient of the Equation (20), the following equation is achieved.
f ( ( x k + ξ 1 ) + ξ 2 ) f ( x k ) + [ f ( x k ) . ξ 1 ] + [ f ( x k ) . ξ 2 ] + [ f ( x k ) . ξ 1 ] . ξ 2 .
Finally, from the last equation of Formula (17) and Equations (20) and (21), we have
f ( x k + ξ 1 + ξ 2 + ξ 3 ) f ( x k + ξ 1 + ξ 2 ) + f ( x k + ξ 1 + ξ 2 ) . ξ 3 f ( x k ) + f ( x k ) . ξ 1 + f ( x k ) . ξ 2 + f ( x k ) . ξ 3 + [ f ( x k ) . ξ 1 ] . ξ 2 + [ f ( x k ) . ξ 1 ] . ξ 3 + [ f ( x k ) . ξ 2 ] . ξ 3 + [ f ( x k ) . ξ 1 ] . ξ 2 . ξ 3 .
A comparison between Equations (18) and (22) implies that in terms of Taylor expansion, which includes second order derivatives, Equation (22) does not consist of the following statement
1 2 [ f ( x k ) . ξ 1 ] . ξ 1 + [ f ( x k ) . ξ 2 ] . ξ 2 + [ f ( x k ) . ξ 3 ] . ξ 3
and in terms of Taylor expansion, which includes third order derivatives, the Equation (22) only includes
[ f ( x k ) . ξ 1 ] . ξ 2 . ξ 3 .
The above analysis concludes that the three-step discretization algorithm includes information about the value of the objective function, gradient of the function, second order and third order derivatives of the objective function. However, it does not contain all the information of the second and third order derivatives of the objective function.

4. Convergence

In this section, we prove the convergence of this method. The following theorem shows that the modified Armijo-line search at the first step of Formula (8) stops after a finite number of steps.
Theorem 1.
Suppose that the function f satisfies in Assumption 1 and let d k ( 0 ) be a descent direction at x k ( 0 ) . Then, for fixed c ( 0 , 1 )
(i) 
the modified Armijo condition
f ( x k ( 0 ) + α 3 d k ( 0 ) ) < f ( x k ( 0 ) ) + α 3 c f ( x k ( 0 ) ) . d k ( 0 )
is satisfied for all α [ 0 , α m a x ( 0 ) ] , where
α m a x ( 0 ) = 6 ( c 1 ) f ( x k ( 0 ) ) . d k ( 0 ) L d k ( 0 ) 2
(ii) 
for fixed ρ ( 0 , 1 ) the step size generated by the backtracking algorithm with modified Armijo condition (9) terminates with
α k ( 0 ) min { α i n i t i a l , ρ α m a x ( 0 ) } .
Proof of Theorem 1.
First, we prove the first part of the theorem. Since f is a Lipschitz continuous function and according to Taylor expansion:
f ( x k ( 0 ) + α d k ( 0 ) 3 ) = f ( x k ( 0 ) ) + α 3 f ( x k ( 0 ) ) . d k ( 0 ) + E .
where | E | L 18 α 2 d k ( 0 ) 2 .
If α α m a x ( 0 ) we have
α L d k ( 0 ) 2 6 ( c 1 ) f ( x k ( 0 ) ) . d k ( 0 )
and by using Equation (23)
f ( x k ( 0 ) + α d k ( 0 ) 3 ) f ( x k ( 0 ) ) + α 3 f ( x k ( 0 ) ) . d k ( 0 ) + L 18 α 2 d k ( 0 ) 2 f ( x k ( 0 ) ) + α 3 f ( x k ( 0 ) ) . d k ( 0 ) + α 3 ( c 1 ) f ( x k ( 0 ) ) . d k ( 0 ) f ( x k ( 0 ) ) + α 3 c f ( x k ( 0 ) ) . d k ( 0 ) .
To prove the second part we use α f i n a l ( 0 ) to show α k ( 0 ) , i.e.,
α f i n a l ( 0 ) = α k ( 0 ) .
Now, we know from the first part that the modified Armijo-line search will stop as soon as α α m a x ( 0 ) . If α i n i t i a l satisfies Equation (9) then α i n i t i a l = α k ( 0 ) . Otherwise, in the last line search iteration, we have:
α f i n a l 1 ( 0 ) > α m a x ( 0 ) , α k ( 0 ) = α f i n a l ( 0 ) = ρ α f i n a l 1 ( 0 ) > ρ α m a x ( 0 ) .
The combination of these two cases will present the main result. ☐
There is a similar analysis of Theorem 1 which says that the backtracking algorithm with modified Armijo-line search in the second and third step of Formula (8) ends in a finite number of steps.
Theorem 2.
Let the function f satisfy Assumption 1 and d k ( 1 ) be a descent direction at x k ( 1 ) . Then, for fixed c ( 0 , 1 )
(i) 
the modified Armijo condition
f ( x k ( 1 ) + α 2 d k ( 1 ) ) < f ( x k ( 1 ) ) + α 2 c f ( x k ( 1 ) ) . d k ( 1 )
is satisfied for all α [ 0 , α m a x ( 1 ) ] , where
α m a x ( 1 ) = 4 ( c 1 ) f ( x k ( 1 ) ) . d k ( 1 ) L d k ( 1 ) 2 .
(ii) 
for fixed ρ ( 0 , 1 ) the step size generated by the backtracking algorithm with modified Armijo condition of Equation (12) terminates with
α k ( 1 ) min { α i n i t i a l , ρ α m a x ( 1 ) } .
Proof of Theorem 2.
The proof process is similar to the Theorem 1. ☐
Theorem 3.
Let the function f satisfy Assumption 1 and d k ( 2 ) be a descent direction at x k ( 2 ) . Then, for fixed c ( 0 , 1 )
(i) 
the Armijo condition
f ( x k ( 2 ) + α d k ( 2 ) ) f ( x k ( 2 ) ) + α c f ( x k ( 2 ) ) . d k ( 2 )
is satisfied for all α [ 0 , α m a x ( 2 ) ] , where
α m a x ( 2 ) = 2 ( c 1 ) f ( x k ( 2 ) ) . d k ( 2 ) L d k ( 2 ) 2 .
(ii) 
for fixed ρ ( 0 , 1 ) the step size generated by the backtracking algorithm with Armijo condition of Equation (14) terminates with
α k ( 2 ) min { α i n i t i a l , ρ α m a x ( 2 ) } .
Proof of Theorem 3.
Proof process is similar to the Theorem 1. ☐
Theorem 4.
(Global convergence of the three-step discretization algorithm)
Suppose that the function f satisfies Assumption 1 and d k ( j ) is a descent direction at x k ( j ) for j = 0 , 1 , 2 . Then, for the iterates generated by the three-step discretization algorithm, one of the following situations occurs,
(i) 
f ( x k ( j ) ) = 0 , for some k 0 and j { 0 , 1 , 2 } ,
(ii) 
lim k f ( x k ( j ) ) = , j { 0 , 1 , 2 } ,
(iii) 
lim k f ( x k ( j ) ) = 0 , j { 0 , 1 , 2 } .
Proof of Theorem 4.
Assume (i) and (ii) are not satisfied, then the third case should be proved.
At first, according to Equations (10), (13), (15) and (16) and by considering modified Armijo conditions we have
f ( x k + 1 ( 0 ) ) f ( x 0 ( 0 ) ) = l = 0 k f ( x l + 1 ( 0 ) ) f ( x l ( 0 ) ) = l = 0 k f ( x l ( 2 ) + α l ( 2 ) d l ( 2 ) ) l = 0 k f ( x l ( 0 ) ) l = 0 k f ( x l ( 2 ) ) + α l ( 2 ) c f ( x l ( 2 ) ) . d l ( 2 ) l = 0 k f ( x l ( 0 ) ) l = 0 k f ( x l ( 1 ) ) + α l ( 1 ) 2 c f ( x l ( 1 ) ) . d l ( 1 ) l = 0 k f ( x l ( 0 ) ) + l = 0 k α l ( 2 ) c f ( x l ( 2 ) ) . d l ( 2 ) l = 0 k f ( x l ( 0 ) ) + α l ( 0 ) 3 c f ( x l ( 0 ) ) . d l ( 0 ) l = 0 k f ( x l ( 0 ) ) + l = 0 k α l ( 1 ) 2 c f ( x l ( 1 ) ) . d l ( 1 ) + l = 0 k α l ( 2 ) c f ( x l ( 2 ) ) . d l ( 2 ) l = 0 k α l ( 0 ) 3 c f ( x l ( 0 ) ) . d l ( 0 ) + l = 0 k α l ( 1 ) 2 c f ( x l ( 1 ) ) . d l ( 1 ) + l = 0 k α l ( 2 ) c f ( x l ( 2 ) ) . d l ( 2 ) .
where α l ( 0 ) , α l ( 1 ) and α l ( 2 ) for l = 0 , 1 , , k are obtained through the backtracking algorithm with modified Armijo conditions in the first, second and third step of Formula (8), respectively. Since d l ( 0 ) is a descent direction at x l ( 0 ) , d l ( 1 ) is a descent direction at x l ( 1 ) and d l ( 2 ) is a descent direction at x l ( 2 ) for l = 0 , 1 , , k , from Equations (3) and (24) we have
l = 0 α l ( 0 ) 3 | f ( x l ( 0 ) ) . d l ( 0 ) | + l = 0 α l ( 1 ) 2 | f ( x l ( 1 ) ) . d l ( 1 ) | + l = 0 α l ( 2 ) | f ( x l ( 2 ) ) . d l ( 2 ) | c 1 lim k | f ( x 0 ( 0 ) ) f ( x k + 1 ( 0 ) ) | <
and then
lim l α l ( j ) | f ( x l ( j ) ) . d l ( j ) | = 0 , j = 0 , 1 , 2 .
According to Theorems 1–3, backtracking algorithm with modified Armijo conditions terminates with
α k ( j ) min { α i n i t i a l , ρ α m a x ( j ) } , j = 0 , 1 , 2 .
Let j = 0 . If
α i n i t i a l = min { α i n i t i a l , ρ α m a x ( 0 ) } ,
then from the Theorem 1 we have
α k ( 0 ) = α i n i t i a l
and
α k ( 0 ) | f ( x k ( 0 ) ) . d k ( 0 ) | = α i n i t i a l | f ( x k ( 0 ) ) . d k ( 0 ) | .
Thus, from the Equation (25) it follows
lim k | f ( x k ( 0 ) ) . d k ( 0 ) | = 0 .
Since d k ( 0 ) = f ( x k ( 0 ) ) , the following equation is achieved.
lim k f ( x k ( 0 ) ) = 0
In another case, if
ρ α m a x ( 0 ) = min { α i n i t i a l , ρ α m a x ( 0 ) } ,
according to Theorem 1 we have
ρ α m a x ( 0 ) α k ( 0 ) α m a x ( 0 ) .
Thus
α k ( 0 ) | f ( x k ( 0 ) ) . d k ( 0 ) | ρ α m a x ( 0 ) | f ( x k ( 0 ) ) . d k ( 0 ) | 6 ρ ( 1 c ) | f ( x k ( 0 ) ) . d k ( 0 ) | 2 L d k ( 0 ) 2 ,
and from Equation (25) we have
lim k | f ( x k ( 0 ) ) . d k ( 0 ) | d k ( 0 ) = 0 .
The following equation is obtained by replacement ( f ( x k ( 0 ) ) ) with d k ( 0 ) in the Equation (26).
lim k f ( x k ( 0 ) ) = 0 .
There is a similar analysis for j = 1 and j = 2 . Generally, we obtain
lim k f ( x k ( j ) ) = 0 , j = 0 , 1 , 2 .
Finally, the Equation (27) is equivalent to
lim k f ( x k ( j ) ) = 0 , j = 0 , 1 , 2 .
 ☐
In the Theorem 4, in the first case, a stationary point is found during a finite number of steps. In the second case, the function f ( x ) is unbounded below and a minimum does not exist. In the third case, then f ( x k ( j ) ) 0 for j = 0 , 1 , 2 , which means the implementation of all three steps in the three-step discretization method get closer to the stationary point.

5. Numerical Experiments

In this section, we consider some numerical results for the three-step discretization method. We use MATLAB 2016 (R2016b, MathWorks, Natick, MA, USA). The stopping rule for the Algorithm 1 is designed to decrease the gradient norm to 10 3 . Parameters c and ρ are fixed 10 3 and 0 . 5 , respectively. We set up the parameter α i n i t i a l = 1 throughout the entire algorithm. The numerical comparisons include:
  • Iterative numbers (denote by NI) for attaining the same stopping criterion f 2 < 10 3 .
  • Evaluation numbers of f (denote by Nf)
  • Evaluation numbers of f (denote by Ng)
  • Difference between the value of the function at the optimal point and the value of the function at the last calculated point as the accuracy of the method; i.e., e r r o r = f ( x o p t i m a l ) f ( x f i n a l ) 2 .
We tested 45 problems from the [39,40,41,42]. Table 1 shows function names and their dimension.
Table 1. Test problems of general objective functions.
In Table 1 Function 1 is the following function:
f ( x , y ) = log 1 log x ( 1 x ) y ( 1 y ) .
where the optimal solution is x * = ( 0 . 5 , 0 . 5 ) and f ( x * ) = 1 . 32776143 [41] . Also, Function 28 , Function 33 and Function 34 are the test functions 28, 33 and 34 in the [42], respectively.
The numerical results of some of the functions in the Table 1 are reported in the following tables. Table 2 shows iterative numbers (NI), Table 3 compares the Nf, Table 4 compares the Ng and Table 5 presents the error. We compare our method with the steepest descent method, the FR method and the conjugate gradient method in [37] which we refer to as the Zhang method. The letter “F” in the tables shows that the corresponding method did not succeed in approaching the optimal point. The FR method has the most failure modes since it did not produce a descent direction. In comparison with the steepest descent method, the proposed method has fewer iterations and fewer function and gradient evaluations. Also, the proposed method shows good agreement with the conjugate gradient method in [37].
Table 2. Iteration numbers (NI) for different methods.
Table 3. The number of function evaluations (Nf) for different methods.
Table 4. The number of gradient evaluations (Ng) for different methods.
Table 5. Accuracy (error) for different methods.
Also, we use the performance profiles in [43] to compare the performance of the considered methods. If the performance profile of a method is higher than the performance profiles of the other methods, then this method performed better than the other methods.
Figure 1a,b and Figure 2a,b show the performance profiles measured in accordance with CPU time, NI, Nf and Ng, respectively. From the viewpoint of CPU time and the number of iterations (NI), we observe in Figure 1 that the three-step discretization method is successful. The proposed method works better than the steepest descent method. Although the FR method is rapid and accurate, it has not led to descent direction in many cases. Therefore, the graph of FR method placed lower than the graph of the steepest descent method for τ 4 .
Figure 1. Performance profiles based on CPU time (a) and the number of iterations (NI) (b) for 45 functions in the Table 1.
Figure 2. Performance profiles based on the number of function evaluations (Nf) (a) and the number of gradient evaluations (Ng) (b) for 45 functions in the Table 1.
From the viewpoint of the number of function evaluations (Nf) and the number of gradient evaluations (Ng), Figure 2 shows that the Zhang method needs fewer computational costs. In this figure the proposed method almost comparable with the Zhang method. Also, for τ 5 the proposed method is superior to other methods.

6. Conclusions

Based on three-step Taylor expansion and Armijo-line search, we propose the three-step discretization algorithm for unconstrained optimization problems. The presented method uses some information of the objective function which exists in the third-order Taylor series while there is no requirement to calculate higher order derivatives. The global convergence of the proposed algorithm is proved. Some numerical experiments are conducted on the proposed algorithm. In comparison with the steepest descent method, the numerical performance of the proposed method is superior.

Author Contributions

All authors contributed significantly to the study and preparation of the article. They have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ahookhosh, M.; Ghaderi, S. On efficiency of nonmonotone Armijo-type line searches. Appl. Math. Model. 2017, 43, 170–190. [Google Scholar] [CrossRef]
  2. Andrei, N. A new three-term conjugate gradient algorithm for unconstrained optimization. Numer. Algorithms 2015, 68, 305–321. [Google Scholar] [CrossRef]
  3. Edgar, T.F.; Himmelblau, D.M.; Lasdon, L.S. Optimization of Chemical Processes; McGraw-Hill: New York, NY, USA, 2001. [Google Scholar]
  4. Nocedal, J.; Wright, S.J. Numerical Optimization; Springer Verlag: New York, NY, USA, 2006. [Google Scholar]
  5. Shi, Z.J. Convergence of line search methods for unconstrained optimization. Appl. Math. Comput. 2004, 157, 393–405. [Google Scholar] [CrossRef]
  6. Vieira, D.A.G.; Lisboa, A.C. Line search methods with guaranteed asymptotical convergence to an improving local optimum of multimodal functions. Eur. J. Oper. Res. 2014, 235, 38–46. [Google Scholar] [CrossRef]
  7. Yuan, G.; Wei, Z.; Lu, X. Global convergence of BFGS and PRP methods under a modified weak Wolfe-Powell line search. Appl. Math. Model. 2017, 47, 811–825. [Google Scholar] [CrossRef]
  8. Wang, J.; Zhu, D. Derivative-free restrictively preconditioned conjugate gradient path method without line search technique for solving linear equality constrained optimization. Comput. Math. Appl. 2017, 73, 277–293. [Google Scholar] [CrossRef]
  9. Zhou, G. A descent algorithm without line search for unconstrained optimization. Appl. Math. Comput. 2009, 215, 2528–2533. [Google Scholar] [CrossRef]
  10. Zhou, G.; Feng, C. The steepest descent algorithm without line search for p-Laplacian. Appl. Math. Comput. 2013, 224, 36–45. [Google Scholar] [CrossRef]
  11. Koks, D. Explorations in Mathematical Physics: The Concepts Behind an Elegant Language; Springer Science: New York, NY, USA, 2006. [Google Scholar]
  12. Dennis, J.E., Jr.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; SIAM: Philadelphia, PA, USA, 1996. [Google Scholar]
  13. Otmar, S. A convergence analysis of a method of steepest descent and a two-step algorothm for nonlinear ill–posed problems. Numer. Funct. Anal. Optim. 1996, 17, 197–214. [Google Scholar] [CrossRef]
  14. Ebadi, M.J.; Hosseini, A.; Hosseini, M.M. A projection type steepest descent neural network for solving a class of nonsmooth optimization problems. Neurocomputing 2017, 193, 197–202. [Google Scholar] [CrossRef]
  15. Yousefpour, R. Combination of steepest descent and BFGS methods for nonconvex nonsmooth optimization. Numer. Algorithms 2016, 72, 57–90. [Google Scholar] [CrossRef]
  16. Gonzaga, C.C.; Schneider, R.M. On the steepest descent algorithm for quadratic functions. Comput. Optim. Appl. 2016, 63, 523–542. [Google Scholar] [CrossRef]
  17. Jiang, C.B.; Kawahara, M. A three-step finite element method for unsteady incompressible flows. Comput. Mech. 1993, 11, 355–370. [Google Scholar] [CrossRef]
  18. Kumar, B.V.R.; Kumar, S. Convergence of Three-Step Taylor Galerkin finite element scheme based monotone schwarz iterative method for singularly perturbed differential-difference equation. Numer. Funct. Anal. Optim. 2015, 36, 1029–1045. [Google Scholar]
  19. Kumar, B.V.R.; Mehra, M. A three-step wavelet Galerkin method for parabolic and hyperbolic partial differential equations. Int. J. Comput. Math. 2006, 83, 143–157. [Google Scholar] [CrossRef]
  20. Cauchy, A. General method for solving simultaneous equations systems. Comp. Rend. Sci. 1847, 25, 46–89. [Google Scholar]
  21. Barzilai, J.; Borwein, J.M. Two-point step size gradient methods. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
  22. Dai, Y.H.; Fletcher, R. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 2005, 100, 21–47. [Google Scholar]
  23. Hassan, M.A.; Leong, W.J.; Farid, M. A new gradient method via quasicauchy relation which guarantees descent. J. Comput. Appl. Math. 2009, 230, 300–305. [Google Scholar] [CrossRef]
  24. Leong, W.J.; Hassan, M.A.; Farid, M. A monotone gradient method via weak secant equation for unconstrained optimization. Taiwan. J. Math. 2010, 14, 413–423. [Google Scholar] [CrossRef]
  25. Tan, C.; Ma, S.; Dai, Y.; Qian, Y. Barzilai-Borwein step size for stochastic gradient descent. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
  26. Raydan, M. On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 1993, 13, 321–326. [Google Scholar] [CrossRef]
  27. Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 2002, 26, 1–10. [Google Scholar] [CrossRef]
  28. Dai, Y.H. A New Analysis on the Barzilai-Borwein Gradient Method. J. Oper. Res. Soc. China 2013, 1, 187–198. [Google Scholar] [CrossRef]
  29. Yuan, Y. A new stepsize for the steepest descent method. J. Comput. Appl. Math. 2006, 24, 149–156. [Google Scholar]
  30. Manton, J.M. Modified steepest descent and Newton algorithms for orthogonally constrained optimisation: Part II The complex Grassmann manifold. In Proceedings of the Sixth International Symposium on Signal Processing and its Applications (ISSPA), Kuala Lumpur, Malaysia, 13–16 August 2001. [Google Scholar]
  31. Shan, G.H. Gradient-Type Methods for Unconstrained Optimization. Bachelor’s Thesis, University Tunku Abdul Rahman, Petaling Jaya, Malaysia, 2016. [Google Scholar]
  32. Hestenes, M.R.; Stiefel, M. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 1952, 49, 409–436. [Google Scholar] [CrossRef]
  33. Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef]
  34. Hager, W.W.; Zhang, H. A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2, 35–58. [Google Scholar]
  35. Dai, Y.H.; Liao, L.Z. New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 2001, 43, 87–101. [Google Scholar] [CrossRef]
  36. Kobayashi, M.; Narushima, M.; Yabe, H. Nonlinear conjugate gradient methods with structured secant condition for nonlinear least squares problems. J. Comput. Appl. Math. 2010, 234, 375–397. [Google Scholar] [CrossRef]
  37. Zhang, L.; Zhou, W.; Li, D. Global convergence of a modified Fletcher-Reeves conjugate gradient method with Armijo-type line search. Numer. Math. 2006, 104, 561–572. [Google Scholar] [CrossRef]
  38. Sugiki, K.; Narushima, Y.; Yabe, H. Globally convergent three-term conjugate gradient methods that use secant conditions and generate descent search directions for unconstrained optimization. J. Optim. Theory Appl. 2012, 153, 733–757. [Google Scholar] [CrossRef]
  39. Jamil, M.; Yang, X.S. A literature survey of benchmark functions for global optimization problems. Int. J. Math. Model. Numer. Optim. 2013, 4, 150–194. [Google Scholar]
  40. Molga, M.; Smutnicki, C. Test Functions for Optimization Needs, 2005. Available online: http://www.robertmarks.org/Classes/ENGR5358/Papers/functions.pdf (accessed on 30 June 2013).
  41. Papa Quiroz, E.A.; Quispe, E.M.; Oliveira, P.R. Steepest descent method with a generalized Armijo search for quasiconvex functions on Riemannian manifolds. J. Math. Anal. Appl. 2008, 341, 467–477. [Google Scholar] [CrossRef]
  42. Moré, J.J.; Garbow, B.S.; Hillstrome, K.E. Testing unconstrained optimization software. ACM Trans. Math. Softw. 1981, 7, 17–41. [Google Scholar]
  43. Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. Ser. A 2002, 91, 201–213. [Google Scholar]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.