1. Accelerated Double Direction and Double Step Size Methods Overview
In order to define an efficient optimization model for solving unconstrained nonlinear tasks, we approach the matter on multiple fronts. One of the primers is insuring a fast convergence, desirably close enough to the Newton method’s convergence rate. On the other hand, we would like to avoid eventual complicated calculations that can arise from deriving Hessians’ second order partial derivatives. That is why the quasi-Newton method is a good starting point in developing an optimization method with good performance profiles. The benefits of the quasi-Newton methods are well known. One of the main characteristics of these iterations is the conservation of good convergence features, although the Hessian, i.e., the Hessian’s inverse, is not explicitly used. Instead, the appropriately defined Hessian’s approximation, or the approximation of its inverse is used in these methods. This way, the quasi-Newton methods preserve a good convergence rate and, at same time, avoid the possible difficulties of Hessians’ calculations. In this paper, we are using a quasi-Newton concept to define an efficient minimization scheme for solving unconstrained minimization problems, assigned as:
where
is an objective function.
When defining an optimization iterative models based on the quasi-Newton form, we can start with the following general iteration:
where
stands for a current iterative point,
is the next one,
is the iterative step length and
is the search direction of the
th iteration. For iterations of the quasi-Newton type, the search direction is defined trough the gradient features. Therewith, an iterative direction vector has to fulfill the descent condition, i.e.,
In condition (
3), by
, we denote the gradient of the objective function at
. Furthermore, we adopt the usual notations:
where
and
are the standard notations for the gradient and the Hessian of the goal function, respectively.
The way of defining the iterative step length
and the iterative search direction vector
directly influences the methods’ efficiency. With that, some authors [
1,
2,
3,
4,
5] segregated one parameter more, equally important as the other two, that contributes to the method’s performance characteristics. That is an iterative accelerated parameter, often marked out as
In [
1], the author marked this parameter as
, and its iterative value is expressed by the relation (
5). Researchers on this topic justifiably extricated a class of accelerated gradient schemes. In [
3], for example, authors numerically confirmed more than evident performance progress in favor of the accelerated method when compared to its non-accelerated version. Here are some expressions of the accelerated factors defined in the accelerated gradient models mentioned above. These accelerated parameters are also listed in [
6]:
Interesting ideas of the double step length and the double direction approach in defining an efficient minimization iteration are presented in [
2,
3]. In both of these studies, the authors used properly determined accelerating characteristics. In this paper, we use the proven good properties of each of these models, i.e., of the accelerated double direction, or shortly, the ADD method, as well as of the accelerated double step size-ADSS method.
The ADD iteration is defined by the following expression:
where
is the acceleration parameter. The iterative step length
is derived using the Armijos’ Backtracking inexact lines search algorithm. Variable
stands for the second vector direction, and it is calculated by the next rule:
where
is the solution of the problem
,
The two search directions in the ADD method are
, defined by the previous rule and
. One of the main results in [
3] is that the ADD algorithm provides a lower number of iterations than the accelerated gradient descent method, marked as the SM method, which is presented in [
2]. The iterative form of the SM method is given by the expression:
where
is the iterative step length value, and
is the acceleration parameter of the SM iteration expressed by the relation (
6).
The accelerated double step size model, i.e., the ADSS, is defined as
Parameters
and
are two iterative step lengths, calculated by two different Backtracking procedures, and
is the ADSS iterative accelerated parameter. In the ADSS iteration, we can identify the vector direction as:
Transformed ADSS method, or in short, the TADSS, came from the ADSS scheme under the following condition:
The TADSS iteration is defined as:
From expression (
13), we conclude that the defined vector direction has the form of a negative gradient direction. Having that in mind, it depends on the step length parameters as well as on the accelerated parameter iterative value. Numerical experiments from [
4] show that the ADSS iteration outperforms the ADD [
3] and the SM [
2] schemes regarding all three of the analyzed metrics: the number of iterations, CPU time and the number of function evaluations.
We are motivated to define the method as an improved merged version of the accelerated double direction and double step size methods. At the same time, the proposed model should be of the simpler form than the ADD and the ADSS schemes are. We define this simpler form by ejecting one of the Backtracking algorithms from the ADSS iteration and by replacing the algorithm (
11) in the ADD scheme with the gradient descent rule. Taking all these assumptions, we expect the proposed iterative method to be convergent at least at the same rate as the ADD and the ADSS methods are. That modified iteration, based on the mentioned accelerated gradient descent algorithms, should conserve the positive sides of its predecessors but also exceed them regarding the performance profiles of all tested metrics.
The paper is organized in the following way: In
Section 2, we define the improved version of the ADD and the ADSS schemes. The convergence analysis of the defined model is carried out in
Section 3. Numerical test results are compared, analyzed and displayed in
Section 4.
2. Modified Accelerated Double Direction and Double Step Size Method
Taking into account the iterative form of the accelerated ADD method as well as good performance features of the accelerated double step size ADSS scheme, considering all three tested metrics, we propose the following iterative model for solving a large scale of unconstrained minimization problems:
Iterative scheme (
15) presents the merged variant of the ADD and the ADSS methods, keeping the favorable aspects of each included gradient scheme. We denoted the iterative rule (
15) as the
modified accelerated double direction and double step size method, or in short,
modADS. In the modADS scheme, one iterative search direction is
, and the other is simply a negative gradient direction. Two step lengths,
and
, are obtained using one Backtracking procedure. Basically, our main goal in generating the modADS method is to define an improved merged version of the accelerated double direction and double step size methods. Having that in mind, we want to conserve the positive aspects of each of these two baseline models. The form of the ADD iteration contains only one iterative step length value, i.e., one Backtracking procedure is applied. That was the main motivation to substitute the second iterative value
from the ADSS iteration with the
. In this way, we conserve the form of the ADD iteration in the new modADS scheme.
On the other hand, from the results presented in [
4], we know that the second search direction
defined in the ADD iteration by (
11) causes an increase in the number of function evaluations. Therefore, instead of it, just like in the ADSS iteration, in the new modADS process we simply use the gradient descent direction for the second search direction, as well.
There are certainly many different options for defining the second iterative step length in the double-direction and double step size models that differ from our choice:
. That question is still open. Since the modADS belongs to the class of accelerated double direction and double step size methods and presents a merged form of the ADD and the ADSS iteration, the choice to keep
as the second step length value was a natural one. Additionally, according to the TADSS iteration (
14), it could be said that the TADSS corresponds to a different choice of second step size
of the ADSS iteration. Therefore, this is also a motivation to define the modADS in a presented way and to compare the performance features of these two similar approaches.
So, the common elements of the ADD, the ADSS and the proposed modADSS iterative form represent the iterative step length value,
, and the search direction vector
. The other search direction in the modADS is
, just like in the ADSS scheme. Still, as previously explained, the second step-size value of the new method differs from the one,
, applied in the ADSS model. Instead of using an additional inexact line search technique to calculate the second iterative step length value, in the modADS, we use only one Backtracking procedure and define the second step length parameter as the quadratic value of the Backtracking outcome
. This way, we evidently provide a decrease in the computational time, number of needed iterations and function evaluations. We confirm this statement in
Section 4 by comparative analysis of the performance profiles of each of the tested models.
The algorithm of the Backtracking procedure upon which we calculate the iterative step length value is given by the following steps:
Objective function , the direction of the search at the point and numbers and are required;
;
, take ;
Return .
We now derive the iterative value of the acceleration parameter using the second order Taylors’ expansion of the modADS iteration (
15). To avoid huge expressions in that process, we simplified the relation (
15) using the next substitution:
where
. Second order Taylor polynomial of (
16) is then:
In relation (
17),
stands for the Hessian of the objective function, and variable
fulfills the following conditions:
We replace Hessian
with a properly defined scalar diagonal matrix
where variable
is the acceleration parameter we are searching for:
From the previous expression, we can easily compute the iterative value of the acceleration factor:
We are only interested in the positive
values because, in that case, both of the second order necessary and the second order sufficient conditions are fulfilled. However, if in some iterative steps we calculate a negative value for the acceleration parameter, then we simply set
This choice of
transforms our modADS iteration into the standard gradient descent iterative method, i.e.,
for some
.
For initial values , , , , we now present the modADS algorithm:
Set , compute , and take ;
If , then go to Step 8, else continue by the step 3;
Apply Backtracking algorithm to calculate the iterative step length ;
Compute
using (
15);
Determine the acceleration parameter
using (
19);
If , then take ;
Set , go to Step 2;
Return and .
4. Numerical Outcomes and Comparative Analysis
In this section, we display the numerical results, using which we compare the relevant methods. For comparative models, in addition to the objective modADSS method presented in this paper, we primarily chose the accelerated double direction (ADD) method introduced in [
3] and the accelerated double step-size (ADSS) iteration from [
4]. This is a natural choice of comparative optimization processes since the derived modADS algorithm originates from these two gradient accelerated schemes and our basic goal is the improvement of this class of methods. Then, we investigate the impact of Backtracking parameter
by testing two more values for this parameter. The TADSS method, presented in [
5], and the modADS introduced in this paper present two different ways of reducing the double step-size ADSS scheme into a single step length iteration. Due to this fact, we compare these two methods as well. Finally, we complete the numerical comparative analysis by comparing the defined modADS model with two more general gradient descent methods: Cauchy’s gradient method (GD) and Andrei’s accelerated gradient method (AGD) from [
1].
The ADD scheme brought benefits regarding the reduction in the needed number of iterations towards its non-linear version and the SM method from [
2]. Furthermore, in [
4], the ADSS shown undisputed advances with respect to all three of the tested metrics: the number of iterations, the CPU time and the number of function evaluations. It has been compared with the SM and the ADD schemes.
All codes are written in the visual C++ programming language and run on a Workstation Intel(R) Core(TM) 2.3 GHz. The following values of the Backtracking parameters are taken: and .
The stopping criteria are:
We chose 10 values for the number of parameters for each test function: 100; 500; 1000; 3000; 5000; 10,000; 15,000; 20,000; 25,000 and 30,000. As a final result for 1 test function, we sum all 10 outcomes. We measured all three performance characteristics: the number of iterations, CPU and the number of evaluations. If for a certain number of iterations and for some test functions the applied model does not finish the test process in some defined time, we put the constant
, the time-limiter parameter, in
Table 1 and
Table 2.
Remark 1. Time-limiter parameter is introduced in [3]. It is posed as an indicator for stopping the code execution, after some defined time, 120 s. In the next Listing 1, we list the set of test functions examined in this research. We applied all three compared methods to each of these functions. The proposed functions are taken from a collection of unconstrained optimization test functions introduced in [
9].
Listing 1. Test functions. |
1. Extended Penalty |
2. Perturbed Quadratic |
3. Raydan-1 |
4. Diagonal 1 |
5. Diagonal 3 |
6. Generalized Tridiagonal-1 |
7. Extended Tridiagonal-1 |
8. Extended Three Expon. Terms |
9. Diagonal 4 |
10. Extended Himmelblau |
11. Quadr. Diag. Perturbed |
12. Quadratic QF1 |
13. Exten. Quadr. Penalty QP1 |
14. Exten. Quadr. Penalty QP2 |
15. Quadratic QF2 |
16. Extended EP1 |
17. Extended Tridiagonal-2 |
18. Arwhead |
19. Almost Perturbed Quadratic |
20. Engval1 |
21. Quartc |
22. Generalized Quartic |
23. Diagonal 7 |
24. Diagonal 8 |
25. Diagonal 9 |
26. DIXON3DQ |
27. NONSCOMP |
28. HIMMELH |
29. Power (Cute) |
30. Sine |
In
Table 1, we display the results concerning the number of iterations metric. All three of the models provide very good numerical outcomes regarding the number of needed iterations. As expected, modADS and ADSS have an equal number of iterations for many test functions, precisely, 21 out of 30. This is due to the modADS iterative form having similar characteristics to those of the ADSS iteration. All three models give the same number of iterations for three cases. With that, each of the modADS and ADD give the lowest number of iterations in 6 out of 30 cases while ADSS does so in only 1 of 30 cases. A general view shows that modADS gives the final outcomes for all 30 test functions, ADD for 26 and ADSS for 29. ADD broke the time-limiter constant for the Diagonal 7, Diagonal 8, Power (Cute) and Sine functions. Execution time is exceeded only for the Sine function when the ADSS model is applied.
Regarding the speed of execution of each comparative model, from the obtained numerical outcomes, we can see that the modADS and ADSS models perform almost equally, and that is why we did not display the results obtained on this metric. Both models give zeros for CPU time in 29 out of 30 cases, and only modADS was successfully applied on the test function (Sine), while the ADSS iteration broke the execution time in this case. The ADD model has the worst outcomes in testing this characteristic with four breaks.
The contents of the
Table 2 show the number of function evaluations for all three of the tested models. It is obvious that the modADS achieved the greatest improvement regarding this performance characteristic, when compared to the other two test processes. This method convincingly gives the lowest number of function evaluations in 29 out of 30 cases. The ADSS has the best outcome in 1 case only, while the ADD has very high numbers as results regarding this metric for almost all 30 test functions.
The average values concerning the three analyzed criteria for all comparative models are displayed in
Table 3. We included the results of these computations achieved on 26 out of 30 test functions, on which we could apply all methods without breaking the execution time. From this Table, we can obtain a general impression about the performance features of the generated modADS process in comparison to its forerunners. We see that this new accelerated variant is equally fast as the ADSS scheme, it slightly goes beyond the ADSS regarding the number of iterations metric and evidently gives a significant shift in the number of evaluations. When compared with the ADD iteration, the modADS iteration upgrades it multiple times regarding all three performance profiles. More precisely, the modADS gives a 4 times lower average number of iterations, more than a 142 times lower number of function evaluations and it is multiple times faster than the ADD process.
We now analyze the dependency of the approaches regarding the Backtracking parameter beta. As mentioned before in this Section, in all previously displayed results, in the algorithms of all three comparative models, the value of this parameter was set to
. We conducted 600 additional tests over the modADS, the ADD and the AGD algorithms for 2 more values of this parameter:
and
. For that purpose, we chose the first 10 test functions from the Listing 1. In the following
Table 4 and
Table 5, we display the sums of the obtained results regarding the number of iterations and the number of evaluations for these three comparative models. As expected, the modADS demonstrates similar performance regarding the analyzed metrics when compared to the ADD and the ADSS methods, just as in the case of
. Concerning the number of iterations, for both beta values, the modADS acts similar to the ADSS method. Regarding the number of evaluations, again for each of the 2 additional beta values, it gives the best results in 7 out of 10 cases when compared to the ADSS and in all 10 cases in comparison to the ADD scheme.
Furthermore, we compare performance metrics between the modADS and the transformed ADSS, i.e., the TADSS. In [
5], the authors confirmed that the TADSS provides better numerical outcomes regarding the number of iterations, CPU time and number of function evaluations in comparison with the ADSS scheme on 22 chosen test functions. From the results presented in the previous
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5, we concluded that the modADSS behaves similarly to the ADSS regarding the number of iterations and the CPU time, but it provides a lower number of evaluations. Due to results from [
5], we may expect that the TADSS has better performance results than the modADSS with respect to the number of iterations. In
Table 6, we present the achieved test results not only for the 22 test functions from [
5] but for all 30 test functions from Listing 1. With that, we show in
Table 7 a more general overview of the average results regarding all analyzed metrics.
Although the results from
Table 6 illustrate that the TADSS provides a lower number of iterations in even 17 out of 30 test functions, still the general average outcomes confirm that the modADS provides more than 3 times better outcomes with respect to this metric than the TADSS process. According to the
Table 6 results, when we analyze the number of function evaluations, the modADS and the TADSS obtain an equal number of the best outcomes. Yet, from the results presented in
Table 7, we are assured that the modADS is almost three times more effective on this matter when compared to the TADSS iteration. From
Table 6, we can also notice that for the Sine function, the TADSS process exceeds the execution time.
To achieve more general view of the performance features of the modADS method, we conducted additional comparisons with a classical gradient method, defined by Cauchy, and with the accelerated gradient method from [
1]. We further denote these comparative methods by GD and AGD, respectively. The execution times were very long for the previously chosen number of variables. Due to that reason, we changed this set into the set of the next 10 decreased values: 10, 100, 200, 300, 500, 700, 800, 1000, 2000 and 3000. We tested the first 15 test functions from the Listing 1 by applying the modADS, the GD and the AGD iterative rules. The sums of 450 additional tests outcomes are displayed in the following
Table 8,
Table 9 and
Table 10.
From
Table 8, we can see that it is undoubtedly evident that the modADS gives the lowest number of iterations compared to the GD and the AGD methods in all 15 test functions.
The CPU execution time needed when 3 comparative models are applied on first 15 test functions is listed in the
Table 9. We see that, except in four cases when all three methods have the same (zero) outcomes, the modADSS is again a dominant model regarding this aspect, as well.
The number of objective function evaluations achieved by the modADS, the GD and the AGD are illustrated in the
Table 10. General conclusions over this performance metric are the same as regarding the number of iterations (
Table 8), i.e., the modADS has the best outcomes for all 15 test functions.
As a summary, we display in
Table 11 the comparisons of the average results obtained by three comparative methods (modADS, GD and AGD) regarding all three performance characteristics. The results displayed in this table confirm that the modADSS requires an approximately 417 times lower number of iterations compared to the GD method and an about 263 times lower number of iterations compared to the AGD method. Regarding the needed number of evaluations, the modADS outperforms the GD and the AGD methods over the 1420 times.