An S-Hybridization Technique Using Two-Directional Optimization

Rakočević, Vladimir; Petrović, Milena J.

doi:10.3390/axioms14020131

Open AccessArticle

An S-Hybridization Technique Using Two-Directional Optimization

by

Vladimir Rakočević

^1,2

and

Milena J. Petrović

^3,*

¹

Faculty of Sciences and Mathematics, University of Niš, Višegradska 33, 18108 Niš, Serbia

²

Serbian Academy of Sciences and Arts, Kneza Mihaila 35, 11000 Belgrade, Serbia

³

Faculty of Sciences and Mathematics, University of Priština in Kosovska Mitrovica, Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(2), 131; https://doi.org/10.3390/axioms14020131

Submission received: 14 December 2024 / Revised: 7 February 2025 / Accepted: 8 February 2025 / Published: 11 February 2025

Download Versions Notes

Abstract

In this paper, we study a recently established s-hybrid approach for generating gradient descent methods for solving optimization tasks. We present an s-hybrid variant of the accelerated double-direction method. The results obtained based on convergence analysis confirm the first-order consistency of the newly defined method on a set of strictly convex quadratic functions.

Keywords:

line search; gradient descent methods; Newton method; convergence rate; accelerated methods; hybrid methods

MSC:

90C30; 90C06; 49M37; 65K99; 47H09; 47H10

1. Background

Our main focus concerns a special form of accelerated gradient methods that contain two vector directions. Aiming to develop as efficient of an optimization method as possible, we explored an approach that combines several different search vectors. In a calculative sense, iterations with two directions have become the preferred choice above others. In [1], the authors suggested the double-direction iterative method for solving non-differentiable problems

x_{k + 1} = x_{k} + t_{k} s_{k} + t_{k}^{2} d_{k} .

(1)

In (1),

t_{k}

is the iterative step length, while

s_{k}

and

d_{k}

are two differently defined vector directions. These three parameters are described by the following algorithms listed below. These procedures are denoted as the curve search algorithm, the algorithm for deriving vector direction

s_{k}

, and the algorithm for deriving vector direction

d_{k}

, respectively.

Curve search algorithm

t_{k} = q^{i (k)}, 0 < q < 1

(2)

where

i (k)

is the smallest integer from

{0, 1, 2, \dots}

such that

F (x_{k}) - F (x_{k} + q^{i (k)} s_{k} + q^{2 i (k)} d_{k}) \geq σ (- q^{i (k)} g_{k}^{T} s_{k} + \frac{1}{2} q^{4 i (k)} F_{D}^{″} (x_{k}; d_{k})),

(3)

In (3),

σ

is estimated as

0 < σ < 1

,

F_{D}^{″}

stands for the second-order Dini upper-directional derivative at

x_{k}

in direction d, and the function F presents the Moreau–Yosida regularization of the objective function f associated to the metric M, defined as follows:

F (x) = min_{y \in R^{n}} \{f (y) + \frac{1}{2} {∥ y - x ∥}_{M}^{2}\} .

Algorithm for deriving vector direction $s_{k}$

s_{k} (t) = \{\begin{matrix} s_{k}^{*} & if k \leq m - 1 \\ - [(1 - \sum_{i = 2}^{m} t^{i - 1} p_{k}^{i}) g_{k} + \sum_{i = 2}^{m} t^{i - 1} p_{k}^{i} s_{k - i + 1}] & if k \geq mk \end{matrix},

m = c a r d I_{k}

,

m > 1

,

I_{k} = {0, 1, 2, \dots}

is an index set at k-th iteration.

p_{k}^{i} = \frac{ρ {∥ g_{k} ∥}^{2}}{(m - 1) [{∥ g_{k} ∥}^{2} + | g_{k}^{T} s_{k - i + 1} |]}, i = 2, 3, \dots, m,

0 < ρ < 1

and

s_{k}^{*} \neq 0

,

k \leq m - 1

is a vector that satisfies

g_{k}^{T} s_{k}^{*} \leq 0

.

Algorithm for deriving vector direction $d_{k}$

d_{k} (t) = \{\begin{matrix} d_{k}^{*} & if k \leq m - 1 \\ \sum_{i = 2}^{m} t^{i - 1} d_{k - i + 1}^{*} & if k \geq m \end{matrix},

d_{k}^{*}

is the solution to the problem

min_{d \in R} ▽ F {(x_{k})}^{T} d + \frac{1}{2} F_{D}^{″} (x_{k}; d_{k}) .

(4)

The results presented in [1] motivated the authors in [2] to define an accelerated gradient version of the iterative rule (1). In ref. [2], the accelerated double-direction method, denoted as the ADD, is presented as follows:

x_{k + 1} = x_{k} + t_{k}^{2} d_{k} - t_{k} {γ_{k}^{A D D}}^{- 1} g_{k} .

(5)

In (5),

x_{k}

is the current iterative point,

t_{k}

is the iterative step size,

g_{k}

is the gradient of the objective function,

{γ_{k}^{A D D}}^{- 1}

is the accelerated parameter of the ADD method, and

d_{k}

is the second vector direction.

The step size parameter

t_{k}

of the iteration (5) is calculated in the following way:

t_{k} = q^{i (k)}, 0 < q < 1,

i (k)

is the smallest integer from

0, 1, 2, \dots,

such that

f (x_{k}) - f (x_{k} + q^{i (k)} s_{k} + q^{2 i (k)} d_{k}) \geq σ (- q^{i (k)} g_{k}^{T} s_{k} + \frac{1}{2} q^{4 i (k)} γ_{k + 1}),

where

σ

is a real number such that

0 < σ < 1

.

Remark 1.

An alternative method for deriving the iterative step size is using the backtracking line search procedure, originally presented in [3]. In this procedure, we find the iterative step length

t_{k}

, generally by starting with initial value

t = 1

. Applying the Backtracking parameters

0 < σ < 0.5

and

β \in (σ, 1)

, this algorithm checks the following condition:

f (x_{k} + t d_{k}) > f (x_{k}) + σ t g_{k}^{T} d_{k},

(6)

while updating the reduced value of the iterative step size as next

t : = t β

. The optimal initial step length,

t_{k} = t

, is obtained after fulfilling the exit condition of the backtracking algorithm.

Remark 2.

There are two main approaches for calculating the step length parameter in an optimization method:

1. exact line search;

2. inexact line search.

Using procedure 1, in each iteration, the step size value is derived as the solution to the following problem:

f (x_{k} + t_{k} d_{k}) = min f (x_{k} + t d_{k}), t > 0 .

(7)

Clearly, the exact line search requests additional CPU time. Accordingly, the required number of iterations and the number of the function evaluations are certainly increasing. In relation to that, in many contemporary optimization schemes, the step length parameter is derived using the second approach, i.e., through inexact line search procedures. We list some of the commonly applied inexact line search techniques as follows:

Weak Wolfe’s line search algorithm is given by the following relations [4]:

$f (x_{k} + t_{k} d_{k}) \leq f (x_{k}) + δ t_{k} g_{k}^{T} d_{k}$

$g {(x_{k} + t_{k} d_{k})}^{T} d_{k} \geq σ g_{k}^{T} t d_{k};$
Strong Wolfe’s line search rule is expressed as follows:

$f (x_{k} + t_{k} d_{k}) \leq f (x_{k}) + δ t_{k} g_{k}^{T} d_{k}$

$| g {(x_{k} + t_{k} d_{k})}^{T} d_{k} | \geq - σ g_{k}^{T} t d_{k};$
Armijo’s Backtracking procedure is introduced in [3] and defined by Relation (6).
Goldstein’s rule [5] can be stated as a generalization of the Armijo’s with more strict conditions:

$f (x_{k} + t_{k} d_{k}) \leq f (x_{k}) + ρ t_{k} g_{k}^{T} d_{k}$

$f (x_{k} + t_{k} d_{k}) \geq f (x_{k}) + (1 + ϱ) t_{k} g_{k}^{T} d_{k},$

where ρ is the parameter, bounded as in Armijo’s procedure, $ρ \in (0, \frac{1}{2})$ .

The authors in [2] modified the algorithms for deriving vector directions of the iteration (5) in gradient terms. These modifications are illustrated in Algorithms 1 and 2.

Algorithm 1 Calculation of the vector $s_{k}$ :

$s_{k} = - {γ_{k}}^{- 1} g_{k} k$
Algorithm 2 Calculation of the vector $d_{k}$ :

$d_{k} (t) = \{\begin{matrix} d_{k}^{*}, & k \leq m - 1 \\ \sum_{i = 2}^{m} t^{i - 1} d_{k - i + 1}^{*}, & k \geq m \end{matrix}$

where $d_{k}^{*}$ is the solution of the transformed minimization problem (4)

$min_{x \in R} ▽ f {(x_{k})}^{T} d + \frac{1}{2} γ_{k + 1} I = g {(x_{k})}^{T} d + \frac{1}{2} γ_{k + 1} I .$

Remark 3.

The vector of the search direction is one of the important elements of each gradient minimization scheme for solving unconstrained optimization problems. In solving minimization tasks, it is assumed that the iterative search direction

d_{k}

satisfies the following inequality:

g_{k}^{T} d_{k} < 0,

(8)

where

g_{k}

is the gradient of the objective function at point

x_{k}

. Relation 8 is known as the descending condition. We list only some of the approaches for generating search directions that fulfill Condition (8).

One of the common methods for deriving a vector direction that fulfills the descending condition is by expressing it as a multiplication of a negative gradient and a positive accelerated parameter. The acceleration factor can be determined using the features of the first or second-order Taylor expansion of the relevant iteration.
An unpublished idea for accelerated parameter construction relates to the properties of the logarithmic and local double-logarithmic reconstructions described in [6,7,8].
A similar approach concerns third-order accurate non-polynomial reconstructions and hyperbolic reconstructions of the objective function as a basis for developing an efficient search direction [9,10].
The vector direction can be generated as a linear combination of the negative gradient vector and the vector $d_{k}$ defined in Algorithm 2, as proposed in [11].
In [12], the direction vector is presented as multiplication of the gradient and the accelerated parameter $θ_{k} = \frac{a_{k}}{b_{k}}$ , where $a_{k} = t_{k} g_{k}^{T} g_{k}, b_{k} = - t_{k} y_{k}^{T} g_{k}$ ,
The search direction of the conjugate gradient algorithm can be formulated as a linear combination of the gradient and the difference between two successive iterative points, as presented in [13].

The last suggestion from the previous Remark 3 induces an idea that can be used as a basis for further studies. This research can relate to comparisons of a chosen conjugate gradient method and the hybrid accelerated method proposed in this paper.

The general form of the conjugate gradient method is given as follows:

x_{k + 1} = x_{k} + t_{k} d_{k},

(9)

where the iterative step length variable

t_{k}

is calculated via the exact line search or via one of the listed inexact line searches given in Remark 1. The main specific of a conjugate gradient scheme comes from the method of generating the vector direction

d_{k}

, which is defined as follows:

\begin{matrix} d_{0} = - (x_{0}) \\ d_{k} = - (x_{k}) + p_{k} d_{k - 1} \\ p_{k} = \frac{∥ f {(x_{k})}^{2} ∥}{∥ f {(x_{k - 1})}^{2} ∥}, \end{matrix}

i.e.,

d_{k} = - g_{k} + \frac{〈 g_{k}, g_{k} 〉}{〈 g_{k - 1}, g_{k - 1} 〉} d_{k - 1} .

(10)

In (10),

〈 g_{k}, g_{k} 〉

symbolize the scalar product of the gradient vectors.

For suggested comparative studies, in relation to the research presented in this paper, it would be valuable to pay special attention on the set of quadratic functions. A quadratic function is defined by the following expression:

f (x) = \frac{1}{2} x^{T} A x + b^{T} x + C,

(11)

where A is a symmetric, positive definite

n \times n

matrix,

b \in R^{n}

, and

C \in R

. Starting with the initial condition

g_{1}^{T} d_{0} = 0

, after some calculations, an update for the vector

d_{k}

is obtained as follows:

d_{k} = - g_{k} + β_{k - 1} d_{k - 1},

(12)

where

β_{k - 1} = \frac{g_{k}^{T} g_{k}}{g_{k - 1}^{T} g_{k - 1}} .

(13)

The conjugate gradient method (9) with a vector direction defined by (12), where

β_{k - 1}

is calculated using Relation (13), is known as the Fletcher–Reeves formulation of the conjugate gradient method [14].

We list several significant variants of the conjugate gradient method that differ with respect to the expressions that define

β_{k - 1}

quotients [15,16,17]:

β_{k - 1} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{d_{k - 1}^{T} (g_{k} - g_{k - 1})};

β_{k - 1} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{g_{k - 1}^{T} g_{k - 1}};

β_{k - 1} = - \frac{g_{k}^{T} g_{k}}{d_{k - 1}^{T} g_{k - 1}};

β_{k - 1} = \frac{g_{k}^{T} g_{k}}{d_{k - 1}^{T} (g_{k} - g_{k - 1})} .

For example, as a comparative minimization model, the conjugate method proposed in [13] can be taken. Providing the suggested comparative analysis would certainly contribute to the optimization community in general.

One of the crucial variables of the ADD scheme (5) is the acceleration factor, calculated using the second-order Taylor expansion of the objective function:

γ_{k + 1} = 2 \frac{f (x_{k + 1}) - f (x_{k}) - α_{k} g_{k}^{T} (α_{k} d_{k} - γ_{k}^{- 1} g_{k})}{{(α_{k} d_{k} - γ_{k}^{- 1} g_{k})}^{T} (α_{k} d_{k} - γ_{k}^{- 1} g_{k})}

(14)

The most significant contribution achieved in ref. [2] likely concerns the importance of the accelerated parameter. To substantiate this fact, the authors constructed and tested the non-accelerated version of the ADD model, called NADD method. In these studies, the incomparable effectiveness of the ADD method was confirmed.

Recently, in [18], the authors introduced a new hybrid approach for generating accelerated gradient optimization methods. This calculative technique is denoted as s-hybridization. In developing this new approach of constructing an efficient minimization scheme, the authors were guided by research regarding the nearly contraction mappings and nearly asymptotically nonexpansive mappings and the existence of fixed points of these classes of mappings [19,20]. The main idea regarding the s-

h y b r i d

schemes arrives from the study presented in [20], where the following three-termed s-iterative rule is presented:

\begin{matrix} x_{1} = x \in C, \\ x_{n + 1} = (1 - α_{n}) T x_{n} + α_{n} T y_{n}, \\ y_{n} = (1 - β_{n}) x_{n} + β_{n} T x_{n} & n \in N . \end{matrix}

(15)

In (15),

{α_{n}}

and

{β_{n}}

are sequences of real numbers satisfying the following conditions:

\begin{matrix} {α_{n}}, {β_{n}} \subset (0, 1) \\ \sum_{n = 1}^{\infty} α_{n} β_{n} (1 - β_{n}) = \infty . \end{matrix}

(16)

The authors in [18] simplified S-Iteration (15) by applying Condition (17)

α_{n} + β_{n} = 1,

(17)

which transforms limits (16)–(18)

\begin{matrix} {α_{n}} \subset (0, 1) \\ \sum_{n = 1}^{\infty} α_{n}^{2} (1 - α_{n}) = \infty . \end{matrix}

(18)

Therewith, the s-iteration with one corrective parameter

α_{n}

is expressed as follows:

\begin{matrix} x_{1} = x \in R, \\ x_{n + 1} = (1 - α_{n}) T x_{n} + α_{n} T y_{n}, \\ y_{n} = α_{n} x_{n} + (1 - α_{n}) T x_{n} & n \in N . \end{matrix}

(19)

Guided by Iteration (19), in connection with the SM method from [21], the authors in [18] proposed the SHSM optimization method (20):

x_{n + 1} = x_{n} - (1 + α_{n} - α_{n}^{2}) γ_{n}^{- 1} t_{n} g_{n} .

(20)

The authors proved that this model is well defined and established comprehensive convergence analysis.

This paper is organized as follows: in Section 2, we develop the s-hybrid double-direction minimization method based on the obtained results from the relevant studies described in the first section. The convergence analysis is presented in Section 3. Numerical investigations are illustrated in Section 4.

2. S-Hybridization of the Accelerated Double-Direction Method

In this section, we generate the s-hybrid model using the ADD iterative rule,

T y_{k} = y_{k} + t_{k}^{2} d_{k} - t_{k} {γ_{k}}^{- 1} g_{k},

as a guiding operator in the three-term process (19). Applying the previously stated facts regarding the s-hybridization technique and the ADD method, we develop the shADD process trough the following three-term relations:

\begin{matrix} x_{1} = x \in R, \\ x_{n + 1} = (1 - α_{n}) T x_{n} + α_{n} T y_{n} = (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) + α_{n} (y_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}), \\ y_{n} = α_{n} x_{n} + (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}), & n \in N . \end{matrix}

(21)

Before we state and prove that the three-termed process (21), rewritten in a merged form, presents an accelerated gradient descent method, we induce the following two important terms.

Proposition 1

(Second-order necessary conditions—unconstrained case [22]). Let

x^{*}

be an interior point of the set Ω, and suppose that

x^{*}

is a relative minimum point over Ω of the function

f \in C^{2}

. Then,

$\nabla f (x^{*}) = 0$ ,
for all $d, d^{T} \nabla^{2} f (x^{*}) d \geq 0$ .

Proposition 2

(Second-ordered sufficient conditions—unconstrained case [22]). Let

f \in C^{2}

be a function defined in a region in which the point

x^{*}

is an interior point. Suppose in addition that

$\nabla f (x^{*}) = 0$ ,
Hessian of $f, F (x^{*})$ , is positive definite.

Then,

x^{*}

is a strict relative minimum point of f.

Lemma 1.

The accelerated gradient iterative form of the shADD process (21) is given by the following relation:

x_{n + 1} = x_{n} - (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]

(22)

Proof.

The merged iterative rule of the

s h A D D

process (21) can be derived by substituting the expression of

y_{n}

from (21) into the previous relation of the same three-term method, i.e., the one that defines

x_{n + 1}

:

\begin{matrix} x_{n + 1} & = (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) \\ + α_{n} [α_{n} x_{n} + (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}] \\ = (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) + α_{n}^{2} x_{n} \\ + α_{n} (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) - α_{n} γ_{n}^{- 1} t_{n} g_{n} + α_{n} t_{n}^{2} d_{n} \\ = (1 - α_{n}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) (1 + α_{n}) + α_{n}^{2} x_{n} - α_{n} t_{n} (γ_{n}^{- 1} g_{n} - t_{n} d_{n}) \\ = (1 - α_{n}^{2}) (x_{n} + t_{n}^{2} d_{n} - t_{n} {γ_{n}}^{- 1} g_{n}) + α_{n}^{2} x_{n} - α_{n} t_{n} (γ_{n}^{- 1} g_{n} - t_{n} d_{n}) \\ = x_{n} - t_{n} γ_{n}^{- 1} g_{n} + t_{n}^{2} d_{n} - α_{n}^{2} x_{n} + α_{n}^{2} t_{n} γ_{n}^{- 1} g_{n} - α_{n}^{2} t_{n}^{2} d_{n} + α_{n}^{2} x_{n} - α_{n} t_{n} (γ_{n}^{- 1} g_{n} - t_{n} d_{n}) \\ = x_{n} - t_{n} (γ_{n}^{- 1} g_{n} - t_{n} d_{n}) (1 - α_{n}^{2} + α_{n}), \end{matrix}

which proves (22).

Now, we show that Method (22) fulfills the gradient descent property. For this purpose, let us rewrite relation (22) as follows:

x_{n + 1} = x_{n} + t_{n} D_{n},

where

D_{n} = - (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] .

(23)

Knowing that

{α_{n}} \in (0, 1)

implies

1 + α_{n} - α_{n}^{2} > 1

. Further,

{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}

can be considered as a linear combination of the gradient vector, since the vector direction

d_{n}

is derived by Algorithm 2. With that, the parameter

γ_{n}

, as an acceleration parameter, is a positive constant. Therefore, direction

D_{n}

is the gradient descent vector.

Now, we derive the iterative value of the acceleration parameter for Method (22). To achieve this goal, we use the second-order Taylor series of the objective function f:

\begin{matrix} f (x_{k + 1}) & \approx f (x_{k}) - g_{k}^{T} (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] \\ + \frac{1}{2} {(1 + α_{k} - α_{k}^{2})}^{2} {(t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}^{T} \nabla^{2} f (ξ) (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k}), \end{matrix}

where

ξ

satisfies the following:

ξ \in [x_{k}, x_{k + 1}], ξ = x_{k} + β (x_{k + 1} - x_{k}) = x_{k} - β (1 + α_{k} - α_{k}^{2}) (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k}), 0 \leq β \leq 1 .

Instead of the function’s Hessian

▽^{2} f (ξ)

, we use the diagonal scalar matrix approximation in the previous Taylor expression, i.e., acceleration matrix

γ_{k + 1} I

:

\begin{matrix} f (x_{k + 1}) & \approx f (x_{k}) - g_{k}^{T} (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] \\ + \frac{1}{2} {(1 + α_{k} - α_{k}^{2})}^{2} {(t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}^{T} γ_{k + 1} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k}), \end{matrix}

This gives us the expression of the acceleration parameter

γ_{k + 1}^{s h A D D}

of the

s h A D D

process:

γ_{k + 1}^{s h A D D} = 2 \frac{(f (x_{k + 1}) - f (x_{k})) - (1 + α_{k} - α_{k}^{2}) g_{k}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}{{(1 + α_{k} - α_{k}^{2})}^{2} {(t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})} .

(24)

We assume the positiveness of the derived acceleration parameter

γ^{s h A D D}

. This fact confirms that the second-order necessary and sufficient conditions have been fulfilled. In the case of

γ_{k + 1}^{s h A D D} < 0

, we assign

γ_{k + 1}^{s h A D D} = 1

and derive the next iterative point as

x_{k + 2} = x_{k + 1} - (1 + α_{n} - α_{n}^{2}) t_{n} [g_{n} - d_{n} t_{n}] .

Knowing that

{α_{n}} \subset (0, 1)

induces

1 + α_{k + 1} - α_{k + 1}^{2} > 0

, together with the fact that

0 < t_{k + 1} < 1,

confirms that the previous scheme is the gradient descent method. □

We end this section by exposing the algorithm of the shADD method, derived on the basis of the previously provided analysis.

Taking the initial values

0 < ρ < 1

,

0 < τ < 1

,

x_{0}

,

γ_{0}^{s h A D D} = 1

, the algorithm of the shADD method is given by the following steps:

Set $k = 0$ , compute $f (x_{0})$ , $g_{0}$ , and take $γ_{0}^{s h A D D} = 1$ ;
If $∥ g_{k} ∥ < ϵ$ , then go to Step 9; else, continue to Step 3;
Apply the backtracking algorithm to calculate the iterative step length $t_{k}$ ;
Compute the first vector direction $s_{k}$ using Algorithm 1;
Compute the second vector direction $d_{k}$ using Algorithm 2;
Compute $x_{k + 1}$ using the iterative rule (22);
Determine the acceleration parameter $γ_{k + 1}^{s h A D D}$ using (24);
If $γ_{k + 1}^{s h A D D} < 0$ , then take $γ_{k + 1}^{s h A D D} = 1$ ;
Set $k : = k + 1$ , go to Step 2;
Return $x_{k + 1}$ and $f (x_{k + 1})$ .

3. Convergence Features of the shADD Method

We start this section with some relevant known statements that can be found in [23,24].

Proposition 3.

If the function

f : R^{n} \to R

is twice continuously differentiable and uniformly convex on

R^{n}

, then:

the function f has a lower bound on $L_{0} = {x \in R^{n} | f (x) \leq f (x_{0})}$ , where $x_{0} \in R^{n}$ is available;
the gradient g is Lipschitz continuous in an open convex set B that contains $L_{0}$ , i.e., there exists $L > 0$ such that

$∥ g (x) - g (y) ∥ \leq L ∥ x - y ∥, \forall x, y \in B .$

(25)

Lemma 2.

Under the assumptions of Lemma 3, there exist real numbers m, M satisfying the following:

0 < m \leq 1 \leq M,

(26)

such that

f (x)

has an unique minimizer

x^{*}

and

{m ∥ y ∥}^{2} \leq y^{T} ▽^{2} f (x) y \leq M {∥ y ∥}^{2}, \forall x, y \in R^{n};

(27)

\frac{1}{2} m ∥ x - x^{*} ∥^{2} \leq f (x) - f (x^{*}) \leq \frac{1}{2} M {∥ x - x^{*} ∥}^{2}, \forall x \in R^{n};

(28)

{m ∥ x - y ∥}^{2} \leq {(g (x) - g (y))}^{T} (x - y) \leq M {∥ x - y ∥}^{2}, \forall x, y \in R^{n};

(29)

Depending on the degree of complexity that a particular non-linear problem may have, the examination of its convergence resorts to establishing convergence on specific sets. Therefore, we expose in this section the convergence analysis of the derived

s h A D D

process on the set of strictly convex quadratic functions. The general expression of the strictly convex quadratics is given by (30).

f (x) = \frac{1}{2} x^{T} A x - b^{T} x .

(30)

In (30), A is the real positive definite symmetric matrix, and

b \in R^{n}

. Further on, we use the following notations regarding the relevant eigenvalues of matrix A:

λ_{1} \leq λ_{2} \leq \dots λ_{n} .

Previous research showed that for the strictly convex quadratic an adequate relation, usually, a connection between the smallest and the largest eigenvalues must be fulfilled in order to establish the convergence of the objective optimization method [2,21,25,26]. In the next lemma, we define that connection by applying the

s h A D D

method.

Lemma 3.

The relation between the smallest and largest eigenvalues of symmetric positive definite matrix

A \in R^{n \times n}

that defines the strictly convex quadratic function (30) to which the

s h A D D

method (22) is applied is given as follows:

λ_{1} \leq \frac{γ_{k + 1}}{t_{k + 1}} \leq \frac{2 λ_{n}}{β}, k \in N,

(31)

where β is the parameter defined in the backtracking procedure.

Proof.

To prove (31), we start with the estimation of the difference of the function (30) values in two successive points:

\begin{matrix} f (x_{n + 1}) - f (x_{n}) & = \frac{1}{2} x_{n + 1}^{T} A x_{n + 1} - b^{T} x_{n + 1} - \frac{1}{2} x_{n}^{T} A x_{n} + b^{T} x_{n} \\ = \frac{1}{2} {[x_{n} - (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]]}^{T} A \\ [x_{n} - (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]] \\ - b^{T} [x_{n} - (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] - \frac{1}{2} x_{n}^{T} A x_{n} + b^{T} x_{n} \\ = \frac{1}{2} x_{n}^{T} A x_{n} - \frac{1}{2} (1 + α_{n} - α_{n}^{2}) t_{n} {[{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]}^{T} A x_{k} \\ - \frac{1}{2} x_{k}^{T} (1 + α_{n} - α_{n}^{2}) t_{n} A [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] \\ + \frac{1}{2} {(1 + α_{n} - α_{n}^{2})}^{2} t_{n}^{2} {[{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]}^{T} A [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] \\ - b^{T} x_{n} + b^{T} (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] - \frac{1}{2} x_{n}^{T} A x_{n} + b^{T} x_{n} \\ = - \frac{1}{2} (1 + α_{n} - α_{n}^{2}) t_{n} {[{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]}^{T} [A x_{n} - b] \\ - \frac{1}{2} (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] [A x_{n}^{T} - b^{T}] \\ + \frac{1}{2} {(1 + α_{n} - α_{n}^{2})}^{2} t_{n}^{2} {γ_{n}}^{- 2} g_{n}^{T} A g_{n} - \frac{1}{2} (1 + α_{n} - α_{n}^{2}) t_{n}^{3} {γ_{n}}^{- 1} {d_{n}}^{T} A g_{n} \\ - \frac{1}{2} {(1 + α_{n} - {α_{n}}^{2})}^{2} {t_{n}}^{3} {γ_{n}}^{- 1} {g_{n}}^{T} A d_{n} + \frac{1}{2} (1 + α_{n} - {α_{n}}^{2}) {t_{n}}^{4} {d_{n}}^{T} A d_{n} \\ = (\frac{1}{2} (1 + α_{n} - α_{n}^{2}) t_{n} {[{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]}^{T} A - {g_{n}}^{T}) (1 + α_{n} - {α_{n}}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] \end{matrix}

The expression above, which describes the difference between function’s values for two successive points, is determined based on the following facts:

Matrix A is symmetric, so

$g_{k}^{T} A d_{k} = d_{k}^{T} A g_{k};$
The gradient of Function (30) is

$g_{k} = A x_{k} - b .$

(32)

We now replace the derived difference in the acceleration parameter expression (24):

\begin{matrix} γ_{k + 1}^{s h A D D} & = 2 \frac{(f (x_{k + 1}) - f (x_{k})) - (1 + α_{k} - α_{k}^{2}) g_{k}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}{{(1 + α_{k} - α_{k}^{2})}^{2} {(t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})} \\ = 2 \frac{((\frac{1}{2} (1 + α_{n} - α_{n}^{2}) t_{n} {[{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]}^{T} A - g_{n}^{T}) (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}])}{{(1 + α_{k} - α_{k}^{2})}^{2} {(t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})} \\ - 2 \frac{(1 + α_{k} - α_{k}^{2}) g_{k}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}{{(1 + α_{k} - α_{k}^{2})}^{2} {(t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})}^{T} (t_{k}^{2} d_{n} - t_{k} γ_{k}^{- 1} g_{k})} \\ = \frac{{(γ_{n}^{- 1} g_{n} - d_{n} t_{n})}^{T} A (γ_{n}^{- 1} g_{n} - d_{n} t_{n})}{{(γ_{n}^{- 1} g_{n} - d_{n} t_{n})}^{T} (γ_{n}^{- 1} g_{n} - d_{n} t_{n})} . \end{matrix}

The obtained expression

\frac{{(γ_{n}^{- 1} g_{n} - d_{n} t_{n})}^{T} A (γ_{n}^{- 1} g_{n} - d_{n} t_{n})}{{(γ_{n}^{- 1} g_{n} - d_{n} t_{n})}^{T} (γ_{n}^{- 1} g_{n} - d_{n} t_{n})}

confirms that the acceleration parameter

γ_{k + 1}^{s h A D D}

can be written as the Rayleigh quotient of the real symmetric positive definite matrix evaluated at the vector

γ_{n}^{- 1} g_{n} - d_{n} t_{n} .

This fact results in the following conclusion:

λ_{1} \leq γ_{k + 1} \leq λ_{n}, k \in N .

(33)

According to findings revealed in [21], the value of the iterative step length of the accelerated gradient method derived via the backtracking inexact algorithm satisfies the following:

t_{n} > \frac{β (1 - σ) γ_{n}}{L},

(34)

where L is the Lipschitz constant that figures in Proposition (3), so the following is valid:

\frac{γ_{n}}{t_{n}} < \frac{L}{β (1 - σ)} .

(35)

Considering relation (32), which defines the gradient of the strictly convex function, we have the following:

\begin{matrix} ∥ g (x) - g (y) ∥ & = ∥ A x - b - A y + b ∥ \\ = ∥ (A x - A y) ∥ = ∥ A (x - y) ∥ \\ \leq ∥ A ∥ ∥ x - y ∥ \\ = λ_{n} ∥ x - y ∥ . \end{matrix}

The previous relation confirms that the largest eigenvalue of the symmetric matrix A fulfills the property of the Lipschitz constant L in (34). Additionally, according to the limitations of the backtracking parameters

σ

and

β

(0 < σ < 0.5), β \in (σ, 1)

, we derive the following estimations:

\frac{γ_{n + 1}}{t_{n + 1}} < \frac{L}{β (1 - σ)} = \frac{λ_{n}}{β (1 - σ)} < \frac{λ_{n}}{β \cdot \frac{1}{2}} = \frac{2 λ_{n}}{β},

which confirms the right side of Estimation (31). Based on (33) and the fact that the iterative step size is less than 1, the first inequality of (31) arises. □

On the basis of proven estimations (31) relating to the acceleration parameter, backtracking parameter, and the lowest and the largest eigenvalues of the symmetric, positive definite matrix A that figures in the expression (30), using the following theorem, we establish the convergence of the

s h A D D

method on the set of strictly convex quadratics.

Theorem 1.

For the strictly convex quadratic function (30), the process (22) is linearly convergent when

λ_{n} < 2 λ_{1}

. More precisely, the following relations are valid:

for some real constants $p_{1}^{k}, p_{2}^{k}, \dots, p_{n}^{k}$ and $q_{1}^{k}, q_{2}^{k}, \dots, q_{n}^{k}$ , such that

$g_{k} = \sum_{i = 1}^{n} p_{i}^{k} v_{i}, d_{k} = \sum_{i = 1}^{n} q_{i}^{k} v_{i},$

(36)

When the vectors $v_{i}, i \in (1, \dots, n)$ represent the orthonormal set of eigenvectors of matrix A, the following inequations are fulfilled:

${(p_{i}^{k + 1})}^{2} \leq δ^{2} {(p_{i}^{k})}^{2} a n d {(q_{i}^{k + 1})}^{2} \leq λ_{n}^{2} {(q_{i}^{k})}^{2},$

(37)

including

$δ = max \{1 - \frac{β λ_{1}}{2 λ_{n}}, \frac{λ_{n}}{λ_{1}} (1 + α_{k} - α_{k}^{2}) - 1\} .$

(38)
For the gradient (32) of the function (30), the following is valid:

$lim_{k \to \infty} ∥ g_{k} ∥ = 0$

(39)

Proof.

Taking the expression of Gradient (32) of Function (30) at the (

n + 1

)-th iteration, we obtain the following:

\begin{matrix} g_{n + 1} & = A x_{n + 1} - b = A (x_{n} - (1 + α_{n} - α_{n}^{2}) t_{n} [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}]) - b \\ = A x_{n} - b - (1 + α_{n} - α_{n}^{2}) t_{n} A [{γ_{n}}^{- 1} g_{n} - d_{n} t_{n}] \\ = g_{n} - (1 + α_{n} - α_{n}^{2}) t_{n} A {γ_{n}}^{- 1} g_{n} + (1 + α_{n} - α_{n}^{2}) t_{n}^{2} A d_{n} \\ = (I - (1 + α_{n} - α_{n}^{2}) t_{n} γ_{n}^{- 1} A) g_{n} + (1 + α_{n} - α_{n}^{2}) t_{n}^{2} A d_{n} . \end{matrix}

Applying the orthonormal representations (36) in the previous equation leads us to the following:

g_{k + 1} = \sum_{i = 1}^{n} (1 - (1 + α_{k} - α_{k}^{2}) t_{k} γ_{k}^{- 1} λ_{i}) p_{i}^{k} v_{i} + (1 + α_{k} - α_{k}^{2}) t_{k}^{2} \sum_{i = 1}^{n} λ_{i} q_{i}^{k} v_{i} .

(40)

Knowing that

|λ_{i}| \leq λ_{n}, i \in {1, 2, \dots, n},

we will prove (38) by showing that

| 1 - (1 + α_{k} - α_{k}^{2}) t_{k} γ_{k}^{- 1} λ_{i} | < 1 .

For this purpose, let us first assume that

λ_{i} < {(1 + α_{k} - α_{k}^{2})}^{- 1} t_{k}^{- 1} γ_{k}

, i.e.,

λ_{i} < \frac{γ_{k}}{(1 + α_{k} - α_{k}^{2}) t_{k}} .

Applying (31), we obtain the following:

1 > \frac{λ_{i} (1 + α_{k} - α_{k}^{2}) t_{k}}{γ_{k}} \geq \frac{λ_{1} t_{k}}{γ_{k}} \geq \frac{β λ_{1}}{2 λ_{n}} .

We rewrite the previous inequalities as follows:

1 - \frac{λ_{i} (1 + α_{k} - α_{k}^{2}) t_{k}}{γ_{k}} \leq 1 - \frac{β λ_{1}}{2 λ_{n}} \leq δ .

(41)

We assume now the opposite case:

λ_{i} > \frac{γ_{k}}{(1 + α_{k} - α_{k}^{2}) t_{k}} .

The last inequality gives

1 < \frac{λ_{i} (1 + α_{k} - α_{k}^{2}) t_{k}}{γ_{k}} \leq \frac{λ_{n} (1 + α_{k} - α_{k}^{2})}{λ_{1}},

which directly implies

| 1 - \frac{λ_{i} (1 + α_{k} - α_{k}^{2}) t_{k}}{γ_{k}} | \leq | \frac{λ_{n}}{λ_{1}} (1 + α_{k} - α_{k}^{2}) - 1 | \leq δ .

(42)

Estimations (41) and (42) prove (38).

Finally, in order to prove (39), we use the gradient representation from (36),

g_{k} = \sum_{i = 1}^{n} p_{i}^{k} v_{i},

which results in the following conclusion:

∥ g_{k} ∥^{2} = \sum_{i = 1}^{n} {(p_{i}^{k})}^{2} .

(43)

Applying the fact that

δ \in (0, 1)

on inequalities (37) directly proves (39). □

Non-Convex Case Overview

In the previous section, we proved that the shADD method linearly converges to a set of strictly convex quadratics. Although it is not the main subject of this research, in this subsection, we analyze a possible application of the presented scheme when the objective function is non-convex. The importance of providing this introductory discussion on this topic simply arises from the endless array of contemporary non-convex problems such as matrix completion, low-rank models, tensor decomposition, and deep neural networks.

Neural networks, considered as universal function approximators, contain significant symmetric properties. These features define them as non-convex structures. Some of the known techniques for solving machine learning problems and other non-convex problems are as follows:

Stochastic gradient descent methods,
Mini batch approach,
Stochastic variance reduced gradient (SVRG) method,
Alternating minimization methods,
Branch and bound methods.

Confirming convergence properties in non-convex optimization is quite difficult. In a theoretical sense, there exist no regular approaches to achieving this goal, as is the case of convex problems. Additionally, there are potentially many local minimums and the existence of saddle points and flat regions when the objective function is non-convex.

Generally, when solving non-convex optimization tasks, theoretical guarantees are very weak, and there is no tried and tested way for ending this process successfully.

Principal component analysis (PCA) is a technique for linear dimensionality reduction, which is useful in proving global convergence of minimization methods when applied on non-convex functions. We propose connecting this approach with the shADD method in further studies. The PCA process can be characterized through the following steps:

Standardizing the range of continuous initial variables;
Computing the covariance matrix to identify correlations;
Computing the eigenvectors and eigenvalues of the covariance matrix to identify the principal components;
Creating a feature vector to decide which principal components to keep;
Recasting the data along the principal components axes.

We set the goal problem as follows: determine the dominant eigenvector and eigenvalue of a positive symmetric semidefinite matrix A. We can write this problem as follows:

u_{1} = arg max_{x} \frac{x^{T} A x}{x^{T} x} .

(44)

The equivalent of problem (44) is (45):

\sqrt{λ_{1}} u_{1} = arg min_{x} {∥ x x^{T} - A ∥}_{F}^{2},

(45)

where

{∥ \cdot ∥}_{F}

is the Frobenius norm, i.e.,

{∥ B ∥}_{F} = \sum_{i} \sum_{j} B_{i, j}^{2}

. Taking the objective function, defined in terms of the Frobenius norm,

f (x) = \frac{1}{4} {∥ x x^{T} - A ∥}_{F}^{2},

(46)

we see that the gradient of this function is given by the following expression:

\nabla f (x) = ∥ x x^{T} - A ∥ x .

(47)

The classical gradient descent update step for Function (46) can be written as follows:

x n + 1 = x_{n} - c_{n} (x_{n} x_{n}^{T} x_{n} - A x_{n}),

(48)

where the adaptive step size parameter fulfills the following relation:

c_{n} = \frac{η}{1 + η x_{n}^{T} x_{n}}

(49)

Applying (49) in (48) leads to the following:

\begin{matrix} x_{n + 1} & = x_{n} - \frac{η}{1 + η x_{n}^{T} x_{n}} (x_{n} x_{n}^{T} x_{n} - A x_{n}) \\ = x_{n} - \frac{η}{1 + η x_{n}^{T} x_{n}} x_{n} x_{n}^{T} x_{n} + \frac{η}{1 + η x_{n}^{T} x_{n}} A x_{n} \\ = (1 - \frac{η}{1 + η x_{n}^{T} x_{n}} x_{n} x_{n}^{T}) x_{n} + \frac{η}{1 + η x_{n}^{T} x_{n}} A x_{n} \\ = \frac{x_{n} + η A x_{n}}{1 + η x_{n}^{T} x_{n}} = \frac{1}{1 + η x_{n}^{T} x_{n}} (I + η A) x_{n} . \end{matrix}

(50)

After inductively taking the previous relation, we conclude that

x_{N} = (I + η A) x_{0} \prod_{t = 0}^{T - 1} \frac{1}{1 + η ∥ x_{n} ∥^{2}} .

(51)

The previous relation (51) confirms that the gradient descent iteration (48) converges linearly, since

\frac{x_{N}}{∥ x_{n} ∥} = \frac{{(I + η A)}^{T} x_{0}}{{∥ (I + η A)}^{T} x_{0} ∥} .

(52)

A similar analysis can be applied to the shADD iteration (22) for Function (46). According to the construction of the vector

d_{k}

(Algorithm 2) in (22), we can modify this vector direction as a linear combination of the gradient vector, as explained in the proof of Lemma 1. Considering this fact allows us to rewrite iteration (22) in a simpler form for which Property (52) can be easily proved.

4. Numerical Test Results

In this section, we analyze the numerical performance of the shADD method depending on the choice of parameter

α

(18), which is aptly named the corrective parameter. For the selected values of this parameter, we track standard numerical metrics, including the number of iterations performed, CPU time, and the number of function evaluations.

As proposed in ref. [27], which presents an extensive comparative analysis of several Khan-hybrid models, for a range of

α_{k}

values (18), we take a specific numerical value

α_{k} \equiv α \in (0, 1)

for all

k \in {1, 2, 3, \dots}

. This further reinforces the corrective expression of the shADD iteration (22):

c (α) = 1 + α - α^{2} \in (1, 2) .

We have observed that for specific values of parameter

α \in (0, 1)

, we have the following values of the expression

c (α) :

\begin{matrix} c (0.1) = 1.09 = c (0.9) \\ c (0.2) = 1.16 = c (0.8) \\ c (0.3) = 1.21 = c (0.7) \\ c (0.4) = 1.24 = c (0.6) \\ c (0.5) = 1.25 . \end{matrix}

(53)

All of this motivated us to test the shADD method for these five specified values of

α

. For this purpose, we selected five test functions from [28]. We tested these functions for the five given values of the corrective parameter

α \in {0.1, 0.2, 0.3, 0.4, 0.5}

and for ten different values of the number of variables

n \in {1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000}

. For each test function, we summarized the obtained outcomes for all selected numbers of variables. The results obtained from the measured performance metrics (number of iterations, CPU time, and number of evaluations) are presented in Table 1, Table 2, and Table 3, respectively.

We can observe that for the first test function, the Extended Penalty function, the values of the output results regarding the number of iterations and the number of evaluations do not depend on changes in the corrective parameter

α

. For the same function, the changes in CPU time for different values of the corrective parameter are also minimal. However, for the other test functions, there are differences in the final values of all measured metrics depending on the choice of the corrective parameter value. In terms of metrics, regarding the number of iterations, the best results are achieved for

α = 0.2

and

α = 0.3

. In the case of measuring CPU time, tests for

α = 0.1

and

α = 0.2

take the least time. In terms of the number of evaluations, the smallest values are achieved for

α = 0.2

and

α = 0.3

. A general conclusion we can draw, based on a total of 250 tests, is that it is advisable to use a corrective parameter value of

α = 0.2

or

α = 0.3

.

The tests were conducted using a standard termination criterion:

∥ g_{k} ∥ \leq 10^{- 6} and \frac{| f (x_{k + 1}) - f (x_{k}) |}{1 + | f (x_{k}) |} .

The code was written in the C++ programming language.

5. Conclusions

In this paper, we propose a new s-hybrid variant of the accelerated double-direction minimization model. The explored convergence properties of the developed iterative rule on the set of strictly convex quadratic functions confirm that the method has linear consistency.

Numerical tests were performed for different values of the corrective parameter. This study leaves an open space for further investigations of the numerical performance of the defined process and comparative analysis regarding similar models.

Author Contributions

Conceptualization, V.R. and M.J.P.; methodology, V.R.; software, M.J.P.; validation, V.R. and M.J.P.; formal analysis, V.R. and M.J.P.; investigation, M.J.P.; resources, V.R.; data curation, V.R. and M.J.P.; writing—original draft preparation, V.R. and M.J.P.; writing—review and editing, V.R.; visualization, M.J.P.; supervision, V.R and M.J.P.; project administration, M.J.P.; funding acquisition, V.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting this study are available from the authors upon request.

Acknowledgments

The author Milena J. Petrović gratefully acknowledges support from the Project supported by Ministry of Science, Technological Development and Innovation of the Republic of Serbia project no. 451-03-65/2024-03/200123.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Djuranović-Miličić, N.I.; Gardašević-Filipović, M. A multi-step curve search algorithm in nonlinear optimization: Nondifferentiable case. Facta Univ. Ser. Inform. 2010, 25, 11–24. [Google Scholar]
Petrović, M.J.; Stanimirovic, P.S. Accelerated Double Direction Method For Solving Unconstrained Optimization Problems. Math. Probl. Eng. 2014, 2014, 965104. [Google Scholar] [CrossRef]
Armijo, L. Minimization of functions having Lipschitz first partial derivatives. Pac. J. Math. 1966, 6, 1–3. [Google Scholar] [CrossRef]
Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1968, 11, 226–235. [Google Scholar] [CrossRef]
Goldstein, A.A. On steepest descent. SIAM J. Control 1965, 3, 147–151. [Google Scholar]
Petrović, M. A Truly Third Order Finite Volume Scheme on Quadrilateral Mesh. Master’s Thesis, Lund University, Lund, Sweden, 2006. [Google Scholar]
Artebrant, R.; Schroll, H.J. Conservative Logarithmic Reconstruction and Finite Volume Methods. SIAM J. Sci. Comput. 2005, 27, 294–314. [Google Scholar] [CrossRef]
Artebrant, R.; Schroll, H.J. Limiter-free Third Order Logarithmic Reconstruction. SIAM J. Sci. Comput. 2006, 28, 359–381. [Google Scholar] [CrossRef]
Artebrant, R. Third order acurate non-polynomial reconstruction on ractangular and triangular meshes. J. Sci. Comput. 2007, 30, 193–221. [Google Scholar] [CrossRef]
Schroll, H.J.; Svensson, F. A Bi-Hyperbolic Finite Volume Method on Quadrilateral Meshes. J. Sci. Comput. 2006, 26, 237–260. [Google Scholar] [CrossRef]
Potra, F.A.; Shi, Y. Efficient line search algorithm for unconstrained optimization. J. Optim. Theory Appl. 1995, 85, 677–704. [Google Scholar] [CrossRef]
Andrei, N. An acceleration of gradient descent algoritham with backtracing for unconstrained optimization. Numer. Algorithms 2006, 42, 63–173. [Google Scholar] [CrossRef]
Andrei, N. An accelerated conjugate gradient algorithm with guaranteed descent and conjugacy conditions for unconstrained optimization. Optim. Methods Softw. 2012, 27, 583–604. [Google Scholar] [CrossRef]
Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, J.Y.; Yuan, Y. Modified two-point step-size gradient methods for unconstrained optimization. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y. Alternate minimization gradient method. IMA J. Numer. Anal. 2003, 23, 377–393. [Google Scholar] [CrossRef]
Polak, E.; Ribiére, G. Note sur la convergence de méthodes de directions conjuguées. Rev. Fr. d’Inform. Rech. Opér. Sér. Rouge 1969, 16, 35–43. [Google Scholar] [CrossRef]
Petrović, M.J.; Rakočević, V.; Ilić, D. Hybrid optimization models based on S-iteration process. Filomat 2024, 38. [Google Scholar]
Sahu, D.R. Fixed points of demicontinuous nearly Lipschitzian mappings in Banach spaces. Comment. Math. Univ. Carol. 2005, 46, 653–666. [Google Scholar]
Agarwal, R.P.; O’Regan, D.; Sahu, D.R. Iterative construction of fixed points of nearly asymptotically nonexpansive mappings. J. Nonlinear Convex Anal. 2007, 8, 61. [Google Scholar]
Stanimirović, P.S.; Miladinović, M.B. Accelerated gradient descent methods with line search. Numer. Algorithms 2010, 54, 503–520. [Google Scholar] [CrossRef]
Luenberg, D.G.; Ye, Y. Linear and Nonlinear Programming; Springer Science+Business Media, LLC: New York, NY, USA, 2008. [Google Scholar]
Ortega, J.M.; Reinboldt, W.C. Iterative Solution of Nonlinear Equation in Several Variables; Academic: London, UK, 1970. [Google Scholar]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Long, J.; Hu, X.; Zhang, L. Improved Newton’s method with exact line searches to solve quadratic matrix equation. J. Comput. Appl. Math. 2008, 222, 645–654. [Google Scholar] [CrossRef]
Zhou, Q. An adaptive nonmomotonic trust region method with curlinear searches. J. Comput. Math. 2006, 24, 761–770. [Google Scholar]
Rakočević, V.; Petrović, M.J. Comparative analysis over accelerated models for solving unconstrained optimization problems with application of Khan’s hybrid rule. Mathemathics 2022, 10, 4411. [Google Scholar] [CrossRef]
Andrei, N. An Unconstrained Optimization Test Functions Collection. Adv. Model. Optim. 2008, 10, 1–15. Available online: https://camo.ici.ro/journal/vol10/v10a10.pdf (accessed on 7 February 2025).

Table 1. Number of iterations with respect to the corrective parameter value.

Test Functions	$α = 0.1$	$α = 0.2$	$α = 0.3$	$α = 0.4$	$α = 0.5$
Extended Penalty function	60	60	60	60	60
Diagonal 1 function	54	70	93	81	87
Diagonal 3 function	190	155	166	177	223
Generalized Tridiagonal 1 function	128	120	120	120	120
Quadratic Diagonal Perturbed function	2588	2407	1455	1618	3779

Table 2. CPU with respect to the corrective parameter value.

Test Functions	$α = 0.1$	$α = 0.2$	$α = 0.3$	$α = 0.4$	$α = 0.5$
Extended Penalty function	5	3	5	5	4
Diagonal 1 function	3	5	6	6	6
Diagonal 3 function	10	12	14	19	17
Generalized Tridiagonal 1 function	10	9	10	10	10
Quadratic Diagonal Perturbed function	321	200	249	175	294

Table 3. Number of evaluations with respect to the corrective parameter value.

Test Functions	$α = 0.1$	$α = 0.2$	$α = 0.3$	$α = 0.4$	$α = 0.5$
Extended Penalty function	3729	3729	3729	3729	3729
Diagonal 1 function	13,979	13,785	19,634	14,440	14,870
Diagonal 3 function	7140	4289	3741	9486	7767
Generalized Tridiagonal 1 function	1053	1006	1013	1019	1027
Quadratic Diagonal Perturbed function	8962	8419	5563	6052	12,535

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rakočević, V.; Petrović, M.J. An S-Hybridization Technique Using Two-Directional Optimization. Axioms 2025, 14, 131. https://doi.org/10.3390/axioms14020131

AMA Style

Rakočević V, Petrović MJ. An S-Hybridization Technique Using Two-Directional Optimization. Axioms. 2025; 14(2):131. https://doi.org/10.3390/axioms14020131

Chicago/Turabian Style

Rakočević, Vladimir, and Milena J. Petrović. 2025. "An S-Hybridization Technique Using Two-Directional Optimization" Axioms 14, no. 2: 131. https://doi.org/10.3390/axioms14020131

APA Style

Rakočević, V., & Petrović, M. J. (2025). An S-Hybridization Technique Using Two-Directional Optimization. Axioms, 14(2), 131. https://doi.org/10.3390/axioms14020131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An S-Hybridization Technique Using Two-Directional Optimization

Abstract

1. Background

2. S-Hybridization of the Accelerated Double-Direction Method

3. Convergence Features of the shADD Method

Non-Convex Case Overview

4. Numerical Test Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI