A New Descent Algorithm Using the Three-Step Discretization Method for Solving Unconstrained Optimization Problems

In this paper, three-step Taylor expansion, which is equivalent to third-order Taylor expansion, is used as a mathematical base of the new descent method. At each iteration of this method, three steps are performed. Each step has a similar structure to the steepest descent method, except that the generalized search direction, step length, and next iterative point are applied. Compared with the steepest descent method, it is shown that the proposed algorithm has higher convergence speed and lower computational cost and storage.


Introduction
We start our discussion by considering the unconstrained minimization problem: T that minimizes f (x 1 , x 2 , . . ., x n ) ≡ f (x).
where f is an n-dimensional continuously differentiable function.
Most effective optimization procedures include some basic steps.At iteration k, where the current x is x k , they do the following: 1. Specify the initial starting vector x 0 = [x 0 1 , x 0 2 , • • • , x 0 n ] T , 2. Find an appropriate search direction d k , 3. Specify the convergence criteria for termination, 4. Minimize along the direction d k to find a new point x k+1 from the following equation where α k is a positive scalar called the step size.The step size is usually determined by an optimization process called a line search (usually inexactly) such as Wolfe-line search, Goldstein-line search or Armijo-line search where c ∈ (0, 1) is a constant control parameter.For more information on the line search strategy, readers can refer to [1][2][3][4][5][6][7].In addition, some researchers have introduced suitable algorithms which defined α k without a line search method that can be seen in the literature [8][9][10].Appropriate choices for an initial starting point have a positive effect on computational cost and speed of convergence.
The numerical optimization of general multivariable objective functions differs mainly in how they generate the search directions.A good search direction should reduce the objective function's value so that f (x k+1 ) < f (x k ).
Such a direction d k is a descent direction and satisfies the following requirement at any point where the dot indicates the inner product of two vectors ∇ f (x k ) and d k .
The choice of search direction d k is typically based on some approximate model of the objective function f which is obtained by the Taylor series [11] where x and ∆x are replaced with x k and αd k , respectively.For instance, a linear approximation (first order Taylor expansion) of the objective function can be used which concludes the following direction This is referred to as the steepest descent method [4].The overall results on the convergence of the steepest descent method can be found in [4,12,13].More practical applications of the steepest descent method are discussed in [10,[14][15][16].
Unlike gradient methods which use first order Taylor expansion, Newton method uses a second order Taylor expansion of the function about the current design point; i.e., a quadratic model which yields Newton direction Although the Newton method is below the quadratic rate of convergence, it requires the computation and storage of some matrices associated with the Hessian of the objective function.Moreover, the Newton method can only be utilized if the Hessian matrix is positive definite [4].
The purpose of this paper is to provide a descent algorithm which can exploit more information about the objective function without the need to store large matrices such as the Hessian matrix.In order to construct a desired model of the objective function, we propose a three-step discretization method based on three-step Taylor expansion that is used in [17][18][19] as follows ).∆x.
(5) Formula ( 5) is equivalent to Equation (4) truncated at three terms, i.e., Hence, using these steps of Formula (5) in the approximation of a function around x gives a third-order accuracy with O[(∆x) 4 ].
On the other hand, we can say that these steps of Formula ( 5) are derived by applying a factorization process to the right side of Equation ( 6) as follows where the symbol I is the identity operator.Now, by removing the terms appearing in O[(∆x) 4 ] and using the Taylor series properties, the first internal bracket in Equation (7) shows f (x + ∆x 3 ), the second internal bracket shows f (x + ∆x 2 ) and the last one shows f (x + ∆x).Jiang and Kawahara [17] are the pioneers who used this formula to solve unsteady incompressible flows governed by the Navier-Stokes equations.In their method, the discretization in time is performed before the spatial approximation by means of Formula (5) with respect to the time.In comparison with the Equation (6), Formula (5) does not contain any new higher-order derivatives.Moreover, it reduces the demand of smoothness of the function, gives the superior approximation of the function and presents some stability advantages in multidimensional problems [17].The Formula ( 5) is useful in solving non-linear partial differential equations, hyperbolic problems, multi-dimensional and coupled equations.
In this article, we utilize Formula (5) to obtain a descent direction that satisfies the inequality of Equation (3) and then modify the Armijo-rule in the line search method to achieve an appropriate step size.Finally, Equation ( 1) is utilized to obtain a sequence that reduces the value of the function.Since all equations of the three-step discretization process involve only the first order derivative of the function, the proposed method is a member of gradient methods such as the steepest descent method.However, numerical results demonstrate that the current method works better than the steepest descent method.
The organization of the paper is as follows.In Section 2, there is a brief review of the gradient-type algorithms and their applications.In Section 3, the three-step discretization algorithm and its fundamental properties are described.In Section 4, we show how the proposed algorithm converges globally.Some noteworthy numerical examples are presented in Section 5. Finally, Section 6 provides conclusions of the study.

Related Work
This section provides an overview of previous research on the gradient-type methods.

Steepest Descent Method
Gradient descent is among the most popular algorithms to perform optimization.The steepest descent method, which can be traced back to Cauchy in 1847 [20], has a rich history and is regarded as the simplest and best-known gradient method.Despite its simplicity, the steepest descent method has played a major role in the development of the theory of optimization.Unfortunately, the classical steepest descent method which uses the exact line search procedure in determining the step size is known for being quite slow in most real-world problems and is therefore not widely used.Recently, several modifications to the steepest descent method have been presented to overcome its weakness.Mostly, these modifications propose effective trends in choosing the step length.Barzilai and Borwein (BB method) [21] presented a new choice of step length through the two-point step size where Although their method did not guarantee the monotonic descent of the residual norms, the BB method was capable of performing quite well for high dimensional problems.The results of Barzilai and Borwein have encouraged many researchers to modify the steepest descent method.For instance, Dai and Fletcher in 2005 [22] used a new gradient projection method for box-constrained quadratic problems with the step length where m = min(m, k − 1) and m ≥ 1 is some prefixed integer.Hassan et al. in 2009 [23] were motivated by the BB method and presented a Monotone gradient method via the Quasi-Cauchy relation.They discover a step length formula that approximates the inverse of Hessian using the Quasi-Cauchy equation which retains the monotonic property for every repetition.Leong et al. in 2010 [24] suggested a fixed step gradient type method to improve the BB method and it is recognized as the Monotone gradient method via the weak secant equation.In the Leong method, the approximation is stored using a diagonal matrix depending on the modified weak secant equation.Recently, Tan et al. in 2016 [25] applied the Barzilai and Borwein step length to the stochastic gradient descent and stochastic variance gradient algorithms.
Plenty of attention has also been paid to the theoretical properties of the BB method.Barzilai and Borwein [21] proved that the BB method has an R-superlinear convergence when the dimension is just two.For the general n-dimensional strong convex quadratic function, this method is also convergent and the convergence rate is R-linear [26,27].Recently, Dai [28] presented a novel analysis of the BB method for two-dimensional strictly convex quadratic functions and showed that there is a superlinear convergence step in at most three consecutive steps.
Another modification method that implemented the choice of an efficient step length was proposed by Yuan [29].He has suggested a new step size for the steepest descent method.The desired formula for the new α k was obtained through an analysis of two-dimensional problems.In his proposed algorithm, this new step size was used in even iterations and the exact line search was used in odd iterations.Unlike the BB method, Yuan's method possesses the monotonic property.For a constant k Yuan's step size is defined as follows where α * 2k−1 and α * 2k are computed through the exact line search.In addition to these mentioned modification algorithms, there is various literature which used different improvement style for the steepest descent method.For example, Manton [30] derived algorithms which minimize a cost function with the constraint condition by reducing the dimension of the optimization problem by reformulating the optimization problem as an unconstrained one on the Grassmann manifold.Zhou and Feng [10] proposed the steepest descent algorithm without the line search for the p-Laplacian problem.In their method, the search direction is the weighted preconditioned steepest descent one, and, with the exception of the first iteration, the step length is measured by a formula.

Newton Method
The steepest descent method lacks second derivative information, causing inaccuracy.Thus, some researchers have used a more effective method which is identified as the Newton method.Newton's work was done in the year 1669, but it was published a few years later.For each iteration of the Newton method, we will need to compute the second derivative called the Hessian of a given function.It is difficult to calculate Hessian manually, especially when differentiation gets complicated.This will need a lot of computing effort and, in some cases, the second derivation cannot be computed analytically.Even for some simpler differentiation, it may be time-consuming.In addition, considerable storage space is needed to store the computed Hessian and this is computationally expensive.For example, if we have a Hessian with n dimensions, we will have O(n 2 ) storage space to store the Hessian [31].

Conjugate Gradient Methods
Another gradient descent method is the conjugate gradient method.The conjugate gradient method has been considered an effective numerical method for solving large-scale unconstrained optimization problems because it does not need the storage of any matrices.The search direction of the conjugate gradient method is defined by where β k is a parameter which characterizes the conjugate gradient method.There are many articles on how to get β k .The Hestenes-Stiefel (HS) [32] and Fletcher-Reeves (FR) [33] are well-known formulas for obtaining this parameter which is respectively given by .
The global convergence properties of these methods have been shown in many research papers; for example, [34].The classical conjugate gradient methods only include the first derivative of the function.In this decade, in order to incorporate the second-order information about the objective function into conjugate gradient methods, many researchers have proposed conjugate gradient methods based on secant conditions.Dai and Liao [35] proposed a conjugate gradient method based on the secant condition and proved its global convergence property.Kobayashi et al. [36] proposed conjugate gradient methods based on structured secant conditions for solving nonlinear least squares problems.Although numerical experiments of the previous research show the effectiveness of these methods for solving large-scale unconstrained optimization problems, these methods do not necessarily satisfy the descent condition.In order to overcome this deficiency, various authors present modifications of the conjugate gradient methods.For example, Zhang et al. [37] presented a modification to the FR method such that the direction generated by the modified method provides a descent direction.Sugiki et al. [38] proposed three-term conjugate gradient methods based on the secant conditions which always satisfy the sufficient descent condition.

New Descent Algorithm
In this section, we show how equations in Formula (5) can be used to propose the three-step discretization method and give its pseudo-code.The following regularity of f (x) is assumed in the following: Assumption 1.We assume f ∈ C 1 (R n ) with Lipschitz continuous gradient, i.e., there exists L > 0 such that The substitution αd for ∆x in Formula (5) yields:

Three-Step Discretization Algorithm
Three steps are performed throughout this algorithm.Actually, in the steepest descent method, there is not an intermediate step between the computation of x k and x k+1 .However, in the proposed method, we impose an intermediate step between x k and x k+1 .In other words, x k is computed from the first step and x k+1 is computed from the third step.The second step is an intermediate step that uses x k as the starting point and produces the new point such as x * .The x * forms the starting point for the last step which results in x k+1 .The structure of the proposed algorithm is characterized as follows.
The main goal of this algorithm is that the value of the objective function declines during all three steps.If the same direction is applied to all three steps, it will be a non-descent direction in at least one of the steps.Therefore, we need to consider each of the steps separately.Moreover, if the point x k is employed in the first step of Formula (8), the point x k+1 will be achieved by the third step of Formula (8).
To determine which step of Formula ( 8) is performing, we use the super-index j.At each upgrading index k, the index j adopts only values (0), ( 1) and (2), respectively.During the kth iteration of the main algorithm, directions d k , and d (2) k are used in the first, second and third step, respectively.The points k , and x (2) k are used as starting points in the first, second and third step, respectively.The proper steps size α k , and α (2) k are obtained from the first, second and last step, respectively.Details are explained as follows.
The first step of Formula ( 8) can be rewritten in the following form The direction d k should be chosen in such a way that ∇ f (x k provides the greatest reduction in f (x Hence, the direction of the first step is calculated similarly to the process of finding steepest descent direction [4].So, in the first step of Formula (8), we choose Next, we must determine a proper step size for the first step.In this article, to find the step size, we use the backtracking algorithm with an Armijo-line search.Considering the first step of Formula (8), the general Armijo condition of Equation ( 2) can be modified as follows Therefore, the implementation of the backtracking algorithm with Equation ( 9) provides the proper step size, α (0) k , and the following point We use x k as a starting point for the second step of Formula (8).We can rewrite the second step as follows: k .

Now the direction d
k should satisfy the descent requirement of Equation ( 3) at the point, k < 0.
By using a similar analysis for finding the steepest descent direction and using Equation ( 10) we have k ).
After determining the direction d k , we should find a proper step size.In the second step, we modify general Armijo condition of Equation ( 2) as follows k ).d k .
We use Equation (12) in the backtracking algorithm to find the proper step size, α k , and the following point We use the point x k as a starting point in the last step of Formula (8).Now, we have the following result in the third step ).d k (2) .
where the Equation (11) gives d k .The direction d k is obtained from Also, according to the last step of Formula (8), the last step of the presented algorithm uses the general Armijo condition of Equation (2).With replacement d k and x (2) Utilization of the Equation ( 14) in the backtracking algorithm gives the proper step size, α k , and the following point After obtaining x k+1 , we go back to the first step of Formula (8).In fact, x k+1 forms a starting point in the first step of Formula (8), i.e., This process will continue until the stop condition is attained.The ensuing result is achieved by considering the fact that each step in the presented method deals with a descent direction.
Therefore, implementation of this method concludes that f (x k+1 ) < f (x k ).The pseudo-code of the three-step discretization method with backtracking Armijo line-search is provided in the Algorithm 1.

Theoretical Analysis of the Three-Step Discretization Algorithm
As mentioned above, the steepest descent method uses only the gradient of the function and the Newton method uses the second order derivatives of the objective function.In this section, we explain about the objective function information included in the proposed algorithm. Let k .
After implementation of the three-step discretization algorithm at k and using Equations ( 10) and ( 13), the following form of Formula (8) will be obtained Now consider the third order Taylor series expansion of f ( According to the first equation of Formula ( 17), we have By considering the first and second equation of Formula (17) and Equation ( 19), we have Now if we take the gradient of the Equation ( 20), the following equation is achieved.
Finally, from the last equation of Formula (17) and Equations ( 20) and ( 21), we have A comparison between Equations ( 18) and (22) implies that in terms of Taylor expansion, which includes second order derivatives, Equation (22) does not consist of the following statement and in terms of Taylor expansion, which includes third order derivatives, the Equation ( 22) The above analysis concludes that the three-step discretization algorithm includes information about the value of the objective function, gradient of the function, second order and third order derivatives of the objective function.However, it does not contain all the information of the second and third order derivatives of the objective function.

Convergence
In this section, we prove the convergence of this method.The following theorem shows that the modified Armijo-line search at the first step of Formula (8) stops after a finite number of steps.
Theorem 1. Suppose that the function f satisfies in Assumption 1 and let d Then, for fixed c ∈ (0, 1) (ii) for fixed ρ ∈ (0, 1) the step size generated by the backtracking algorithm with modified Armijo condition (9) terminates with max }.
Proof of Theorem 1. First, we prove the first part of the theorem.Since ∇ f is a Lipschitz continuous function and according to Taylor expansion: where and by using Equation ( 23) To prove the second part we use α k .Otherwise, in the last line search iteration, we have: The combination of these two cases will present the main result.
There is a similar analysis of Theorem 1 which says that the backtracking algorithm with modified Armijo-line search in the second and third step of Formula (8) ends in a finite number of steps.
(ii) for fixed ρ ∈ (0, 1) the step size generated by the backtracking algorithm with modified Armijo condition of Equation ( 12) terminates with max }.
Proof of Theorem 2. The proof process is similar to the Theorem 1.
Theorem 3. Let the function f satisfy Assumption 1 and d k be a descent direction at x k .Then, for fixed c ∈ (0, 1) (ii) for fixed ρ ∈ (0, 1) the step size generated by the backtracking algorithm with Armijo condition of Equation ( 14) terminates with max }.
Proof of Theorem 3. Proof process is similar to the Theorem 1. k for j = 0, 1, 2. Then, for the iterates generated by the three-step discretization algorithm, one of the following situations occurs, where the optimal solution is x * = (0.5, 0.5) and f (x * ) = 1.32776143 [41] .Also, Function 28 , Function 33  and Function 34 are the test functions 28, 33 and 34 in the [42], respectively.
The numerical results of some of the functions in the Table 1 are reported in the following tables.Table 2 shows iterative numbers (NI), Table 3 compares the Nf, Table 4 compares the Ng and Table 5 presents the error.We compare our method with the steepest descent method, the FR method and the conjugate gradient method in [37] which we refer to as the Zhang method.The letter "F" in the tables shows that the corresponding method did not succeed in approaching the optimal point.The FR method has the most failure modes since it did not produce a descent direction.In comparison with the steepest descent method, the proposed method has fewer iterations and fewer function and gradient evaluations.Also, the proposed method shows good agreement with the conjugate gradient method in [37].Also, we use the performance profiles in [43] to compare the performance of the considered methods.If the performance profile of a method is higher than the performance profiles of the other methods, then this method performed better than the other methods.
Figures 1a,b and 2a,b show the performance profiles measured in accordance with CPU time, NI, Nf and Ng, respectively.From the viewpoint of CPU time and the number of iterations (NI), we observe in Figure 1 that the three-step discretization method is successful.The proposed method works better than the steepest descent method.Although the FR method is rapid and accurate, it has not led to descent direction in many cases.Therefore, the graph of FR method placed lower than the graph of the steepest descent method for τ ≥ 4.
From the viewpoint of the number of function evaluations (Nf) and the number of gradient evaluations (Ng), Figure 2 shows that the Zhang method needs fewer computational costs.In this figure the proposed method almost comparable with the Zhang method.Also, for τ ≥ 5 the proposed method is superior to other methods.1.

Conclusions
Based on three-step Taylor expansion and Armijo-line search, we propose the three-step discretization algorithm for unconstrained optimization problems.The presented method uses some information of the objective function which exists in the third-order Taylor series while there is no requirement to calculate higher order derivatives.The global convergence of the proposed algorithm is proved.Some numerical experiments are conducted on the proposed algorithm.In comparison with the steepest descent method, the numerical performance of the proposed method is superior.
know from the first part that the modified Armijo-line search will stop as soon asα ≤ α (0) max .If α initial satisfies Equation (9) then α initial = α (0)

Theorem 4 .
(Global convergence of the three-step discretization algorithm) Suppose that the function f satisfies Assumption1 and d

Figure 1 .Figure 2 .
Figure 1.Performance profiles based on CPU time (a) and the number of iterations (NI) (b) for 45 functions in the Table1.

Table 2 .
Iteration numbers (NI) for different methods.

Table 3 .
The number of function evaluations (Nf) for different methods.

Table 4 .
The number of gradient evaluations (Ng) for different methods.