Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization

Pankoon, Ontima; Nimana, Nimit

doi:10.3390/a18080469

Open AccessArticle

Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization

by

Ontima Pankoon

and

Nimit Nimana

^*

Department of Mathematics, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(8), 469; https://doi.org/10.3390/a18080469

Submission received: 1 July 2025 / Revised: 21 July 2025 / Accepted: 24 July 2025 / Published: 26 July 2025

(This article belongs to the Section Analysis of Algorithms and Complexity Theory)

Download

Browse Figures

Versions Notes

Abstract

In this work, we consider the problem of minimizing a quasi-convex function over a nonempty closed convex constrained set. In order to approximate a solution of the considered problem, we propose delayed star subgradient methods. The main feature of the proposed methods is that it allows us to use the stale star subgradients when updating the next iteration rather than computing the new star subgradient in every iteration. We subsequently investigate the convergence results of sequences generated by the proposed methods. Finally, we present some numerical experiments on the Cobb–Douglas production efficiency problem to illustrate the effectiveness of the proposed method.

Keywords:

delay; quasi-convex optimization; nonsmooth optimization; quasi-convex function; star subgradient

1. Introduction

In this work, we consider the quasi-convex optimization problem:

\begin{matrix} \begin{matrix} minimize f (x) \\ subject to x \in X, \end{matrix} \end{matrix}

(1)

where the objective function

f : R^{n} \to R

is quasi-convex continuous (possibly) and nondifferentiable, and the constrained set X is nonempty closed and convex. Throughout this work, we will denote the set of all optimal solutions and the optimal value of the problem (1) as

X^{*} : = {argmin}_{x \in X} f (x)

and

f^{*} : = {min}_{x \in X} f (x)

, respectively. We also assume that

X^{*} \neq \emptyset

.

The problem (1) not only forms a fundamental setting for optimization as every convex function is quasi-convex but also generalizes to several practical situations in many fields such as economics, engineering, and others. An important special setting is the problem of minimizing the ratios of two functions [1] (Lemma 4), [2] (Theorem 2.3.8), for example, optimizing the ratio of outputs and inputs for the highest-quality productivity assessments in economics, such as debt-to-equity ratio analysis in corporate financial planning, inventory-to-sales analysis in production planning, and nurse–patient ratio analysis in hospital planning. This kind of problem is known as fractional programming [3,4,5]. As mentioned above, various kinds of iterative methods have been developed to solve the quasi-convex optimization problem (1); see [1,6,7,8,9,10,11,12,13,14] and references therein.

If the function f is convex, it is well-known that the classical method to deal with the constrained convex optimization problem is the projected subgradient method: for an initial point

x_{0} \in X,

compute

\begin{matrix} x_{k + 1} : = P_{X} (x_{k} - α_{k} \tilde{\nabla} f (x_{k})), \forall k \in N_{0}, \end{matrix}

where

\tilde{\nabla} f (x_{k}) \in {g \in R^{n} : 〈 g, y - x 〉 \leq f (y) - f (x) for all y \in R^{n}}

is the Fenchel subgradient of f at

x_{k} .

Because of the simplicity, this method has been continuously developed in the literature, see for instance [8,15]. However, for the quasi-convex case, it should be noted that the Fenchel subgradient may not exist, in general. In this situation, we need a more general subgradient concept, which is the so-called star subgradient proposed in [16,17]. By utilizing the notion of a star subgradient, Kiwiel [1] proposed the star subgradient method to solve the constrained quasi-convex optimization problem (1): for an initial point

x_{0} \in X,

\begin{matrix} x_{k + 1} : = P_{X} (x_{k} - α_{k} \frac{{\tilde{\nabla}}^{⋆} f (x_{k})}{∥ {\tilde{\nabla}}^{⋆} f (x_{k}) ∥}), \forall k \in N_{0}, \end{matrix}

(2)

where

{\tilde{\nabla}}^{⋆} f (x_{k}) \in \partial^{⋆} f (x_{k}) \ {0}

is the nonzero star subgradient of f at

x_{k}

. After that, Hu et al. [6] proposed an inexact version of the star-subgradient method by considering the approximated star subgradient with the presence of noises. By imposing the Hölder condition, they presented the convergence results in various aspects for both constant and diminishing step sizes. Subsequently, several methods have been investigated for solving more general problem settings, for instance, a conditional star subgradient method [8], a stochastic star subgradient method [7], and an incremental star subgradient method [9].

As we can notice, the above-mentioned methods require computing the star subgradient in every single iteration; however, there are some practical situations such as large-scale problems in which the exact star subgradients may be computationally expensive. This shortcoming can be overcome by using the idea of delayed subgradients in which we can retrieve stale subgradients instead of spending time computing a new subgradient in every iteration. This approach is not only useful for dealing with excessive computation but also for handling communication delays in networked systems. Note that various convex methods with delayed gradient or even subgradient updates are developed and studied, for instance, stochastic gradient descent [18,19], incremental type methods [20,21,22,23], distributed type methods [24,25,26], and references therein.

To the best of our knowledge, there is no report about the star subgradient method with delayed updates to solve the problem (1). To fill this gap, in this work, we propose two delayed star subgradient methods (in short, DSSMs) based on the ideas of the star subgradient method (2) and the delayed subgradient updates, called DSSM-I and DSSM-II, respectively. We investigate the convergence properties of the proposed methods in both objective values and iterations, including proving finite convergence.

This paper is organized as follows: Section 2 provides the essential notations, definitions, and facts utilized in this work. Section 3 contains the proposed methods, including their convergence results. This section is divided into two subsections. In Section 4, we then apply DSSM-I to the Cobb–Douglas production efficiency problem. Finally, we provide the summary in Section 5.

2. Preliminaries

This section contains important symbols, definitions, and tools utilized in this work. Throughout this paper, let

R

be the set of all real numbers,

N_{0}

the set of all non-negative integers, and

N

the set of all positive integers. We denote with

R^{n}

a Euclidean space with an inner product

〈 \cdot, \cdot 〉

and an induced norm

∥ \cdot ∥

. We denote the unit sphere by the set

S : = {z \in R^{n} : ∥ z ∥ = 1}

and the open ball centered at the

x \in R^{n}

with radius

δ > 0

by the set

B (x; δ) : = {z \in R^{n} : ∥ z - x ∥ < δ}

.

Let

X \subset R^{n}

be a nonempty set. The symbol

cl (X)

is the closure of X. The distance (function) between

y \in R^{n}

and X is the function

dist : R^{n} \to R

defined by

dist (y, X) : = inf_{x \in X} ∥ y - x ∥ .

The metric projection of

y \in R^{n}

onto X is the point

P_{X} (y) \in X

such that

∥ P_{X} (y) - y ∥ \leq ∥ x - y ∥

for all

x \in X .

Note that the metric projection

P_{X} (y)

exists and is unique for all

y \in R^{n}

whenever X is nonempty, closed, and convex [27] (Theorem 1.2.3). The metric projection onto a nonempty closed convex set X has a nonexpansive property, i.e.,

∥ P_{X} (x) - P_{X} (y) ∥ \leq ∥ x - y ∥

for all

x, y \in R^{n}

[27] (Theorem 2.2.21).

Let

f : R^{n} \to R

be a function, and let

α \in R

. The strictly sublevel and sublevel set of f corresponding to

α

are defined by

S_{f, α}^{<} : = {x \in R^{n} : f (x) < α} and S_{f, α}^{\leq} : = {x \in R^{n} : f (x) \leq α},

respectively. The function f is called upper semicontinuous on

R^{n}

if

S_{f, α}^{<}

is an open set for all

α \in R .

The function f is called lower semicontinuous on

R^{n}

if

S_{f, α}^{\leq}

is a closed set for all

α \in R .

The function f is continuous on

R^{n}

if it is both upper semicontinuous and lower semicontinuous on

R^{n}

. The function f is called quasi-convex on

R^{n}

if

S_{f, α}^{\leq}

is convex for all

α \in R .

For any

x \in R^{n},

the star subdifferential [16,17] of f at

x

is the set

\partial^{⋆} f (x) : = \{g \in R^{n} : 〈 g, y - x 〉 \leq 0 for all y \in S_{f, f (x)}^{<}\} .

The element

g \in \partial^{⋆} f (x)

is called the star subgradient of f at

x

, and it is denoted by

{\tilde{\nabla}}^{⋆} f (x) .

It is obvious that

0 \in \partial^{⋆} f (x)

for all

x \in R^{n}

. The following fact provides the basic properties of the star subgradient, which can be found in [16] (Proposition 30) and [6] (Lemma 2.1).

Fact 1.

Let

f : R^{n} \to R

be a quasi-convex, and let

x \in R^{n} .

Then the following statements hold:

(i): $\partial^{⋆} f (x) \ {0}$ is a nonempty set;
(ii): $\partial^{⋆} f (x)$ is a closed, convex cone.

Next, we will recall the definition and properties of the quasi-Fejer monotone sequence. Let X be a nonempty subset of

R^{n}

. The sequence

{x_{k}}_{k = 0}^{\infty} \subset R^{n}

is said to be quasi-Fejer monotone with respect to X if for any

x \in X,

there exists a sequence

{ζ_{k}}_{k = 0}^{\infty} \subset (0, \infty)

such that

\sum_{k = 0}^{\infty} ζ_{k} < \infty and ∥ x_{k + 1} {- x ∥}^{2} \leq {∥ x_{k} - x ∥}^{2} + ζ_{k} for all k \in N_{0} .

Fact 2

([28]). (Theorem 5.33) Let

X \subset R^{n}

be a nonempty set and

{x_{k}}_{k = 0}^{\infty}

be a sequence in

R^{n}

. If the sequence

{x_{k}}_{k = 0}^{\infty}

is quasi-Fejer monotone with respect to X, then

{x_{k}}_{k = 0}^{\infty}

is bounded and

{lim}_{k \to \infty} ∥ x_{k} - x ∥

exists for all

x \in X .

Finally, we provide the lemma, which is used in the next section.

Lemma 1

([29]). (Lemma 2.1) Let

{a_{k}}_{k = 0}^{\infty}

be a scalar sequence, and let

{b_{k}}_{k = 0}^{\infty}

be a positive real number sequence such that

{lim}_{k \to \infty} \sum_{i = 0}^{k} b_{i} = \infty .

If

{lim}_{k \to \infty} a_{k} = 0

, then

{lim}_{k \to \infty} \frac{\sum_{i = 0}^{k} a_{i} b_{i}}{\sum_{i = 0}^{k} b_{i}} = 0

.

3. Algorithms and Convergence Results

In this section, we introduce two delayed star subgradient methods (DSSM-I and DSSM-II) to solve Problem (1) and subsequently investigate the convergence properties of the proposed methods. To deal with delayed-type methods, we need the following assumption of the boundedness of the time-varying delays, as stated below.

Assumption 1.

The sequence of time-varying delays

{τ_{k}}_{k = 0}^{\infty} \subset N_{0}

is bounded, that is, there exists a non-negative integer τ such that

0 \leq τ_{k} \leq τ

for all

k \in N_{0} .

3.1. Delayed Star Subgradient Method I

Now, we are ready to propose the delayed star subgradient method of the first kind which is defined as the following Algorithm 1.

Algorithm 1 Delayed Star Subgradient Method I (DSSM-I)

Initialization: Given a stepsize

{α_{k}}_{k = 0}^{\infty} \subset (0, \infty),

the delays

{τ_{k}}_{k = 0}^{\infty} \subset N_{0}

, and initial points

x_{0}, x_{- 1}, x_{- 2}, \dots,

x_{- τ} \in X

.
Iterative Step: For a current point

x_{k} \in X

, we compute

x_{k + 1} : = P_{X} (x_{k} - α_{k} g_{k - τ_{k}}),

where

g_{k - τ_{k}} \in \partial^{⋆} f (x_{k - τ_{k}}) \cap S

is a unit star subgradient of f at

x_{k - τ_{k}}

.
Update

k : = k + 1

.

Throughout this work, to simplify the analysis, we denote

x_{0} = x_{- 1} = x_{- 2} = \dots = x_{- τ} .

Remark 1.

(i): Since the function f is quasi-convex, we note that DSSM-I is well defined. In fact, using Fact 1, we have for any $k \in N_{0}$ , $\partial^{⋆} f (x_{k}) \ {0} \neq \emptyset$ . Hence, we are able to select any point ${\tilde{\nabla}}^{⋆} f (x_{k}) \in \partial^{⋆} f (x_{k}) \ {0}$ and subsequently set $g_{k} : = \frac{{\tilde{\nabla}}^{⋆} f (x_{k})}{∥ {\tilde{\nabla}}^{⋆} f (x_{k}) ∥} \in \partial^{⋆} f (x_{k}) \cap S$ . Moreover, since ${x_{k}}_{k = 0}^{\infty} \subset X,$ it is clear that $f (x_{k}) \geq f^{*}$ for all $k \in N_{0}$ .
(ii): If $τ_{k} = 0$ for all $k \in N_{0}$ , DSSM-I coincides with the star subgradient method (SSM), which was proposed by Kiwiel [1].

Remark 2.

Note that methods with time-varying delays allow us to use stale information, which is very helpful when the star subgradients are not easily computed. Assumption 1 is typically assumed for analyzing the convergence results of the delayed-type methods, see [18,19,20,24,25,26] and references therein. The presence of the delayed bound τ ensures that we need to update the unit star subgradient of f at

x_{k - τ_{k}}

in at least every τ iteration. Some examples of the delayed sequences are as follows:

(i): Constant delay [18], that is, $τ_{k} : = τ$ for all $k \in N_{0}$ .
(ii): Cyclic delay [19,20,24,26] and references therein. The typical form of this type is $τ_{k} : = k mod (τ + 1)$ for all $k \in N_{0}$ . In this case, the delays $τ_{k}$ ( $k \in N_{0}$ ) are chosen with deterministic order from the set ${0, 1, 2, \dots, τ}$ . This means that we will use stale information over a consistent period of length $τ + 1$ and update the new star subgradient at every $τ + 1$ iteration.
(iii): Random delay [25], that is, the delays $τ_{k}$ ( $k \in N_{0}$ ) are randomly chosen in the set ${0, 1, 2, \dots, τ}$ .

Assumption 2.

The function f satisfies the Hölder condition with order

p > 0

and modulus

L > 0

on X, i.e.,

f (x) - f^{*} \leq L dist {(x, X^{*})}^{p}

for all

x \in X

.

The Hölder condition assumption is typically assumed when investigating the convergence results of methods for solving quasi-convex optimization problems [6,7,9,10,11,12]. Note that, for the special case of

p = 1

, the Hölder condition is nothing else than the Lipschitz condition.

The following lemma provides a useful property of the star subgradient and will play an important role in the convergence analysis. The formal proof is due to Konnov [10] (Proposition 2.1).

Lemma 2.

Suppose that Assumption 2 holds. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-I. For any

x^{*} \in X^{*}

and

k \in N_{0}

, if

x_{k} \notin X^{*},

then

f (x_{k}) - f^{*} \leq L {〈 g_{k}, x_{k} - x^{*} 〉}^{p} for all g_{k} \in \partial^{⋆} f (x_{k}) \cap S .

Next, we will provide some lemma, which are important tools for proving the convergence results of DSSM-I. We first provide the basic inequality relating to the sufficient decreasing property of the generated sequence.

Lemma 3.

Suppose that Assumption 2 holds. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-I. For any

x^{*} \in X^{*}

and

k \in N_{0},

if

x_{k - τ_{k}} \notin X^{*},

then

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} L^{- \frac{1}{p}} {(f (x_{k - τ_{k}}) - f^{*})}^{\frac{1}{p}} . \end{matrix}

Proof.

Let

x^{*} \in X^{*}

and

k \in N_{0}

be fixed. Suppose that

x_{k - τ_{k}} \notin X^{*} .

We note from the definition

x_{k}

in DSSM-I that

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & = & ∥ P_{X} (x_{k} - α_{k} g_{k - τ_{k}}) - P_{X} (x^{*}) ∥^{2} \\ \leq & ∥ x_{k} - α_{k} g_{k - τ_{k}} - x^{*} ∥^{2} \\ = & ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} - 2 α_{k} 〈 g_{k - τ_{k}}, x_{k} - x^{*} 〉 \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} 〈 g_{k - τ_{k}}, x_{k - τ_{k}} - x^{*} 〉 \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} L^{- \frac{1}{p}} {(f (x_{k - τ_{k}}) - f^{*})}^{\frac{1}{p}}, \end{matrix}

where the first inequality follows from the nonexpansiveness of metric projection, and the second one follows from the Cauchy–Schwarz inequality and the fact that

∥ g_{k - τ_{k}} ∥ = 1

for all

k \in N_{0}

, and the last one follows from Lemma 2. This completes the proof. □

Lemma 4.

Suppose that Assumption 1 holds. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-I, and let

{α_{k}}_{k = 0}^{\infty} \subset (0, \infty)

be a nonincreasing sequence with

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

. Then

∥ x_{k - τ_{k}} - x_{k} ∥ \leq (τ + 1) α_{k - τ}

for all

k \geq τ

and

{lim}_{k \to \infty} ∥ x_{k - τ_{k}} - x_{k} ∥ = 0 .

Proof.

We first note that for each

k \in N_{0}

\begin{matrix} ∥ x_{k} - x_{k - τ_{k}} ∥ = ∥\sum_{i = 1}^{τ_{k}} (x_{k - i + 1} - x_{k - i})∥ \leq \sum_{i = 1}^{τ_{k}} ∥ x_{k - i + 1} - x_{k - i} ∥ \leq \sum_{i = 0}^{τ} ∥ x_{k - i + 1} - x_{k - i} ∥ \end{matrix}

(3)

and

\begin{matrix} ∥ x_{k + 1} - x_{k} ∥ = ∥ P_{X} (x_{k} - α_{k} g_{k - τ_{k}}) - P_{X} (x_{k}) ∥ \leq α_{k} ∥ g_{k - τ_{k}} ∥ = α_{k} . \end{matrix}

(4)

Let

k \in N_{0}

be such that

k \geq τ .

Combining (3) and (4) together with the nonincreasing of

{α_{k}}_{k = 0}^{\infty}

yields

\begin{matrix} ∥ x_{k} - x_{k - τ_{k}} ∥ \leq \sum_{i = 0}^{τ} ∥ x_{k - i + 1} - x_{k - i} ∥ \leq \sum_{i = 0}^{τ} α_{k - i} \leq \sum_{i = 0}^{τ} α_{k - τ} = (τ + 1) α_{k - τ} . \end{matrix}

(5)

Furthermore, since

{lim}_{k \to \infty} α_{k} = 0

, we conclude that

{lim}_{k \to \infty} ∥ x_{k} - x_{k - τ_{k}} ∥ = 0 .

The proof is complete. □

Now, we are ready to prove the first convergence result in the sense of the inferior limit of the function values of the generated sequence is equal to the optimal value

f^{*}

.

Theorem 1.

Suppose that Assumptions 1 and 2 hold. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-I and

{α_{k}}_{k = 0}^{\infty}

be a nonincreasing sequence with

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

. Then, we have

{lim inf}_{k \to \infty} f (x_{k}) = f^{*}

.

Proof.

It is obvious that

{lim inf}_{k \to \infty} f (x_{k}) \geq f^{*}

since

f (x_{k}) \geq f^{*}

for all

k \in N_{0},

so that it is sufficient to prove that

{lim inf}_{k \to \infty} f (x_{k}) \leq f^{*} .

Now, assume to the contrary that

{lim inf}_{k \to \infty} f (x_{k}) > f^{*} .

Then there exists

ϵ > 0

and

k_{1} \in N_{0}

such that

f (x_{k}) > f^{*} + ϵ

for all

k \geq k_{1} .

This yields that

\begin{matrix} f (x_{k - τ_{k}}) > f^{*} + ϵ for all k \geq k_{1} + τ . \end{matrix}

(6)

On the other hand, since

{lim}_{k \to \infty} α_{k} = 0

and

{lim}_{k \to \infty} ∥ x_{k} - x_{k - τ_{k}} ∥ = 0

(from Lemma 4), there exists

k_{2} \in N_{0}

such that

\begin{matrix} α_{k} < (\frac{1}{3}) {(\frac{ϵ}{L})}^{\frac{1}{p}} and ∥ x_{k} - x_{k - τ_{k}} ∥ < (\frac{1}{3}) {(\frac{ϵ}{L})}^{\frac{1}{p}} for all k \geq k_{2} . \end{matrix}

(7)

Let

x^{*} \in X^{*}

and let

k \in N_{0}

be such that

k \geq N_{1} : = max {k_{1} + τ, k_{2}}

. Then, we obtain from Lemma 3 together with the relations (6) and (7) that for any

k \geq N_{1},

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & < & ∥ x_{k} - x^{*} ∥^{2} + α_{k} (\frac{1}{3}) {(\frac{ϵ}{L})}^{\frac{1}{p}} + 2 α_{k} (\frac{1}{3}) {(\frac{ϵ}{L})}^{\frac{1}{p}} - 2 α_{k} {(\frac{ϵ}{L})}^{\frac{1}{p}} \\ = & ∥ x_{k} - x^{*} ∥^{2} - α_{k} {(\frac{ϵ}{L})}^{\frac{1}{p}}, \end{matrix}

which yields

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} < {∥ x_{N_{1}} - x^{*} ∥}^{2} - {(\frac{ϵ}{L})}^{\frac{1}{p}} \sum_{i = N_{1}}^{k} α_{i} . \end{matrix}

(8)

Since

\sum_{k = 0}^{\infty} α_{k} = \infty

, we have

{(\frac{ϵ}{L})}^{\frac{1}{p}} \sum_{k = N_{1}}^{\infty} α_{k} = \infty,

and it follows that there exists

N_{2} \in N_{0}

such that

\begin{matrix} {(\frac{ϵ}{L})}^{\frac{1}{p}} \sum_{i = N_{1}}^{k} α_{i} > {∥ x_{N_{1}} - x^{*} ∥}^{2} for all k \geq N_{2} . \end{matrix}

(9)

Thus, for any

k \geq N : = max {N_{1}, N_{2}},

we obtain from (8) and (9) that

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} < {∥ x_{N} - x^{*} ∥}^{2} - {(\frac{ϵ}{L})}^{\frac{1}{p}} \sum_{i = N}^{k} α_{i} < 0, \end{matrix}

which is a contradiction. Therefore, we can conclude that

{lim inf}_{k \to \infty} f (x_{k}) \leq f^{*},

and hence

{lim inf}_{k \to \infty} f (x_{k}) = f^{*}

, as desired. □

Corollary 1.

Suppose that Assumptions 1 and 2 hold. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-I, and let

{α_{k}}_{k = 0}^{\infty}

be a nonincreasing sequence with

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

. If the sequence

{x_{k}}_{k = 0}^{\infty}

is bounded, then there exists a subsequence of

{x_{k}}_{k = 0}^{\infty}

that converges to an optimal solution in

X^{*}

.

Proof.

We first note from the fact that

{lim inf}_{k \to \infty} f (x_{k}) = f^{*}

in Theorem 1 together with the boundedness of the sequence

{x_{k}}_{k = 0}^{\infty}

and the continuity of f that there exists a subsequence

{x_{k_{i}}}_{i = 0}^{\infty}

of

{x_{k}}_{k = 0}^{\infty}

such that

\begin{matrix} lim_{i \to \infty} f (x_{k_{i}}) = \underset{k \to \infty}{lim inf} f (x_{k}) = f^{*} . \end{matrix}

(10)

Since the sequence

{x_{k_{i}}}_{i = 0}^{\infty}

is also bounded, there exists a subsequence

{x_{k_{i_{j}}}}_{j = 0}^{\infty}

of

{x_{k_{i}}}_{i = 0}^{\infty}

such that

{lim}_{j \to \infty} x_{k_{i_{j}}} = \bar{x} \in R^{n} .

Again, the continuity of f and using (10) yield

f (\bar{x}) = lim_{j \to \infty} f (x_{k_{i_{j}}}) = lim_{i \to \infty} f (x_{k_{i}}) = f^{*} .

Note that since

{x_{k_{i_{j}}}}_{j = 0}^{\infty} \subset X

, the closedness of X yields

\bar{x} \in X,

which therefore implies that

\bar{x}

is an element in

X^{*}

. □

By imposing either the coercivity of the objective function f or the compactness of the constraint X, we can obtain an approximate convergence behavior and some nice convergence results in comparison to the one obtained in Theorem 1.

Theorem 2.

Suppose that Assumptions 1 and 2 hold. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-I, and let

{α_{k}}_{k = 0}^{\infty}

be a nonincreasing sequence with

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

. If the function f is coercive or the constrained set X is compact, then the following statements hold:

(i): For any $σ > 0,$ there exists $N_{σ} \in N_{0}$ such that

$\begin{matrix} dist (x_{k}, X^{*}) \leq ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3} for all k \geq N_{σ}, \end{matrix}$

where $ρ (σ) : = max \{dist (x, X^{*}) : x \in X \cap S_{f, f^{*} + L σ}^{\leq}\}$ ;
(ii): ${lim}_{k \to \infty} dist (x_{k}, X^{*}) = 0$ , and ${lim}_{k \to \infty} f (x_{k}) = f^{*} .$

Proof.

Let

σ > 0

be arbitrary and denote the notation

ρ (σ) : = max \{dist (x, X^{*}) : x \in X \cap S_{f, f^{*} + L σ}^{\leq}\} .

Since the function f is coercive or the constrained set X is compact, we ensure that

X^{*} \neq \emptyset

which together with

X^{*} \subseteq X \cap S_{f, f^{*} + L σ}^{\leq}

yield that

X \cap S_{f, f^{*} + L σ}^{\leq} \neq \emptyset .

Moreover, the continuity and quasi-convexity of f, respectively, imply that

S_{f, f^{*} + L σ}^{\leq}

is closed and convex, and so the set

X \cap S_{f, f^{*} + L σ}^{\leq}

is also closed and convex. Again, with either the compactness of X or the coercivity of f, the intersection

X \cap S_{f, f^{*} + L σ}^{\leq}

is bounded, and hence,

X \cap S_{f, f^{*} + L σ}^{\leq}

is a compact set, which ensures that

ρ (σ) < \infty .

(i)

Let

x^{*} \in X^{*}

be given. Since

{lim}_{k \to \infty} α_{k} = 0,

we also have

{lim}_{k \to \infty} α_{k - τ} = 0 .

Consequently, there exists

k_{σ} \in N_{0}

such that

k_{σ} \geq τ

and

\begin{matrix} α_{k - τ} \leq \frac{σ^{\frac{1}{p}}}{2 τ + 3} for all k \geq k_{σ} . \end{matrix}

(11)

This together with the relation that

∥ x_{k} - x_{k - τ_{k}} ∥ \leq (τ + 1) α_{k - τ}

which is obtained in Lemma 4 yield that

\begin{matrix} ∥ x_{k} - x_{k - τ_{k}} ∥ \leq \frac{(τ + 1) σ^{\frac{1}{p}}}{2 τ + 3} for all k \geq k_{σ} . \end{matrix}

(12)

Now, we will show that

\begin{matrix} dist (x_{k + 1}, X^{*}) \leq max \{dist (x_{k}, X^{*}), ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3}\} for all k \geq k_{σ} . \end{matrix}

(13)

Let

k \geq k_{σ}

be fixed. We divide the proof into two cases according to the behavior of

f (x_{k - τ_{k}})

.

Case 1: Suppose that

f (x_{k - τ_{k}}) > f^{*} + L σ .

Then we have

x_{k - τ_{k}} \notin X^{*} .

Applying Lemma 3 and the nonincreasing property of the stepsize

{α_{k}}_{k = 0}^{\infty}

, we get

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ x_{k} - x^{*} ∥^{2} + α_{k} α_{k - τ} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} σ^{\frac{1}{p}} . \end{matrix}

(14)

Substituting the inequalities (11) and (12) in (14) gives

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ x_{k} - x^{*} ∥^{2} + \frac{σ^{\frac{1}{p}} α_{k}}{2 τ + 3} + \frac{2 (τ + 1) σ^{\frac{1}{p}} α_{k}}{2 τ + 3} - 2 α_{k} σ^{\frac{1}{p}} \leq {∥ x_{k} - x^{*} ∥}^{2} . \end{matrix}

By putting

x^{*} = P_{X^{*}} (x_{k}),

we obtain

dist (x_{k + 1}, X^{*}) \leq dist (x_{k}, X^{*}) .

Case 2: Suppose that

f (x_{k - τ_{k}}) \leq f^{*} + L σ .

Then we have

x_{k - τ_{k}} \in X \cap S_{f, f^{*} + L σ}^{\leq}

and

dist (x_{k - τ_{k}}, X^{*}) \leq ρ (σ) .

We note from the definition of

x_{k + 1} = P_{X} (x_{k} - α_{k} g_{k - τ_{k}})

the nonexpansiveness of

P_{X}

and the nonincreasing property of

{α_{k}}_{k = 0}^{\infty}

that

∥ x_{k + 1} - x^{*} ∥ \leq ∥ x_{k} - x^{*} ∥ + α_{k} \leq ∥ x_{k} - x_{k - τ_{k}} ∥ + ∥ x_{k - τ_{k}} - x^{*} ∥ + α_{k - τ},

which together with the inequality (11) and (12) yield

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥ \leq ∥ x_{k - τ_{k}} - x^{*} ∥ + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3} . \end{matrix}

By putting

x^{*} : = P_{X^{*}} (x_{k - τ_{k}})

and invoking

dist (x_{k - τ_{k}}, X^{*}) \leq ρ (σ),

we obtain

dist (x_{k + 1}, X^{*}) \leq ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3} .

Thus, we proved the relation (13). By using the same analogy of Theorem 1, we obtain that

{lim inf}_{k \to \infty} f (x_{k - τ_{k}}) = f^{*}

. Consequently, there exists

k_{σ}^{'}

such that

k_{σ}^{'} \geq k_{σ}

and

f (x_{k_{σ}^{'} - τ_{k_{σ}^{'}}}) < f^{*} + L σ^{\frac{1}{p}} .

Therefore, we obtain from Case 2 that

\begin{matrix} dist (x_{k_{σ}^{'} + 1}, X^{*}) \leq ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3} . \end{matrix}

Put

N_{σ} : = k_{σ}^{'} + 1

. Suppose that

dist (x_{k}, X^{*}) \leq ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3}

for all

k \geq N_{σ} .

Invoking (13), we obtain

dist (x_{k + 1}, X^{*}) \leq ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3}

. Hence, we conclude by the strong induction that

\begin{matrix} dist (x_{k}, X^{*}) \leq ρ (σ) + \frac{(τ + 2) σ^{\frac{1}{p}}}{2 τ + 3} for all k \geq N_{σ} . \end{matrix}

(i i)

For each

r \in N,

we denote

X_{\frac{1}{r}} : = X \cap S_{f, f^{*} + L (\frac{1}{r})}^{<} .

Observe that

{\{X_{\frac{1}{r}}\}}_{r = 1}^{\infty}

is nonincreasing and

⋂_{r \in N} X_{\frac{1}{r}} = X \cap S_{f, f^{*}}^{\leq} = X^{*} .

Subsequently,

{\{ρ (\frac{1}{r})\}}_{r \in N}

is also nonincreasing and

{inf}_{r \in N} ρ (\frac{1}{r}) = {inf}_{r \in N} {max}_{x \in X_{\frac{1}{r}}} dist (x, X^{*}) = {max}_{x \in X^{*}} dist (x, X^{*}) = 0

. Hence, we obtain that

{lim}_{r \to \infty} ρ (\frac{1}{r}) = 0

.

Now, let

ε > 0

be arbitrary. Since

{lim}_{r \to \infty} (ρ (\frac{1}{r}) + \frac{(τ + 2) {(\frac{1}{r})}^{\frac{1}{p}}}{2 τ + 3}) = 0,

there exists

M_{ε} \in N

such that

ρ (\frac{1}{r}) + \frac{(τ + 2) {(\frac{1}{r})}^{\frac{1}{p}}}{2 τ + 3} < ε for all r \geq M_{ε} .

Since

\frac{1}{M_{ε}} > 0

, there exists

N \in N_{0}

such that

dist (x_{k}, X^{*}) \leq ρ (\frac{1}{M_{ε}}) + \frac{(τ + 2) {(\frac{1}{M_{ε}})}^{\frac{1}{p}}}{2 τ + 3} < ε for all k \geq N .

Hence, we conclude that

{lim}_{k \to \infty} dist (x_{k}, X^{*}) = 0

. Furthermore, by Assumption 2, we therefore obtain that

{lim}_{k \to \infty} f (x_{k}) = f^{*} .

□

Remark 3.

Under the setting of Theorem 2, we observe that if an optimal solution is unique, that is,

X^{*} = {x^{*}},

we obtain that

{lim}_{k \to \infty} ∥ x_{k} - x^{*} ∥ = {lim}_{k \to \infty} dist (x_{k}, X^{*}) = 0,

which means that the generated sequence

{x_{k}}_{k = 0}^{\infty}

converges to an unique solution

x^{*} .

3.2. Delayed Star Subgradient Method II

In order to obtain the convergence of the generated sequence, we need a modified version of DDSM-I which is defined as the following Algorithm 2.

Algorithm 2 Delayed Star Subgradient Method II (DSSM-II)

Initialization: Given a stepsize

{α_{k}}_{k = 0}^{\infty} \subset (0, \infty),

the delays

{τ_{k}}_{k = 0}^{\infty} \subset N_{0}

, and initial points

x_{0}, x_{- 1}, x_{- 2}, \dots,

x_{- τ} \in X

.
Iterative Step: For a current point

x_{k} \in X

, if

f (x_{k}) = f^{*},

we set

x_{k + 1} : = x_{k}

and then stop. If

f (x_{k}) > f^{*},

we compute

x_{k + 1} : = P_{X} (x_{k} - α_{k} g_{k - τ_{k}}),

where

g_{k - τ_{k}} \in \partial^{⋆} f (x_{k - τ_{k}}) \cap S

is a unit star subgradient of f at

x_{k - τ_{k}}

.
Update:

k : = k + 1

.

Remark 4.

(i): If the delays $τ_{k} = 0$ for all $k \in N_{0}$ , DSSM-II is relating to the method proposed by Hu et al. [9] (Algorithm 1) with the special setting of $m = 1$ .
(ii): Note that DSSM-II involves evaluating the function value at the current iterate $x_{k} \in X$ and deciding to update the next iteration $x_{k + 1}$ . If the function value $f (x_{k})$ equals the optimal value $f^{*}$ , then no additional calculations are performed, and DSSM-II terminates so that an optimal solution is obtained at the point $x_{k}$ .

Next, we will derive some fundamental properties for the convergence of DSSM-II.

Lemma 5.

Suppose that Assumption 2 holds. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-II. For any

k \in N_{0}

and

x^{*} \in X^{*}

, if

f (x_{k}) > f^{*},

then

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} L^{- \frac{1}{p}} {(f (x_{k - τ_{k}}) - f^{*})}^{\frac{1}{p}} . \end{matrix}

Proof.

Let

x^{*} \in X^{*}

and

k \in N_{0}

. Suppose that

f (x_{k}) > f^{*}

. We have from the definition of DSSM-II that

x_{k + 1} = P_{X} (x_{k} - α_{k} g_{k - τ_{k}})

. Moreover, since

f (x_{k}) > f^{*},

we also have

f (x_{k - τ_{k}}) > f^{*} .

Thus, by following the proving lines of Lemma 3, we obtain

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} L^{- \frac{1}{p}} {(f (x_{k - τ_{k}}) - f^{*})}^{\frac{1}{p}}, \end{matrix}

as desired. □

Theorem 3.

Suppose that Assumptions 1 and 2 hold. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-II, and let

{α_{k}}_{k = 0}^{\infty} \subset (0, \infty)

be a nonincreasing sequence with

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

. Then, we have

∥ x_{k - τ_{k}} - x_{k} ∥ \leq (τ + 1) α_{k - τ}

for all

k \geq τ

,

{lim}_{k \to \infty} ∥ x_{k - τ_{k}} - x_{k} ∥ = 0

, and

{lim inf}_{k \to \infty} f (x_{k}) = f^{*} .

Proof.

By invoking Lemma 5, we can prove the same analog of Lemma 4 and Theorem 1 to obtain

{lim inf}_{k \to \infty} f (x_{k}) = f^{*} .

□

Theorem 4.

Suppose that Assumptions 1 and 2 hold. Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-II, and let

{α_{k}}_{k = 0}^{\infty} \subset (0, \infty)

be a nonincreasing sequence with

\sum_{k = 0}^{\infty} α_{k} = \infty

and

\sum_{k = 0}^{\infty} α_{k}^{2} < \infty .

Then the sequence

{x_{k}}_{k = 0}^{\infty}

converges to an optimal solution in

X^{*}

and, moreover,

{lim}_{k \to \infty} f (x_{k}) = f^{*} .

Proof.

If there exists

K \in N_{0}

such that

f (x_{K}) = f^{*}

, we have that

x_{K}

is an optimal solution and

x_{k} = x_{K}

for all

k \geq K .

Consequently, we obtain

{lim}_{k \to \infty} x_{k} = x_{K} \in X^{*}

and

{lim}_{k \to \infty} f (x_{k}) = f^{*}

as required.

On the other hand, we suppose that

f (x_{k}) > f^{*}

for all

k \in N_{0} .

Let

k \in N_{0}

and let

x^{*} \in X^{*} .

We note from Lemma 5 that

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} L^{- \frac{1}{p}} {(f (x_{k - τ_{k}}) - f^{*})}^{\frac{1}{p}} \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ . \end{matrix}

Let us admit from the inequalities (3) and (4) that

∥ x_{k} - x_{k - τ_{k}} ∥ \leq \sum_{i = 0}^{τ} ∥ x_{k - i + 1} - x_{k - i} ∥

and

∥ x_{k + 1} - x_{k} ∥ \leq α_{k}

for all

k \in N_{0} .

Thus, for a fixed

N \in N_{0}

, we have

\begin{matrix} \sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ & \leq & \sum_{k = 0}^{N} α_{k} \sum_{i = 0}^{τ} ∥ x_{k - i + 1} - x_{k - i} ∥ \\ = & \sum_{i = 0}^{τ} \sum_{k = - i}^{N - i} α_{k + i} ∥ x_{k + 1} - x_{k} ∥ \leq \sum_{i = 0}^{τ} \sum_{k = 0}^{N} α_{k + i} ∥ x_{k + 1} - x_{k} ∥ \leq \sum_{i = 0}^{τ} \sum_{k = 0}^{N} α_{k + i} α_{k}, \end{matrix}

which together with the nonincreasing property of

{α_{k}}_{k = 0}^{\infty}

implies

\begin{matrix} \sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ \leq \sum_{i = 0}^{τ} \sum_{k = 0}^{N} α_{k}^{2} = (τ + 1) \sum_{k = 0}^{N} α_{k}^{2} . \end{matrix}

(15)

By approaching

N \to \infty

in (15) and using the assumption that

\sum_{k = 0}^{\infty} α_{k}^{2} < \infty

, we obtain that

\sum_{k = 0}^{\infty} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ < \infty

, and so

\sum_{k = 0}^{\infty} (α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥) < \infty .

Therefore, the sequence

{x_{k}}_{k = 0}^{\infty}

is quasi-Fejer monotone. By applying Fact 2, we obtain that the sequence

{x_{k}}_{k = 0}^{\infty}

is bounded and

{lim}_{k \to \infty} ∥ x_{k} - x^{*} ∥

exists.

Since the sequence

{x_{k}}_{k = 0}^{\infty}

is bounded and

{lim inf}_{k \to \infty} f (x_{k}) = f^{*}

(Theorem 3), there exists a subsequence

{x_{k_{i}}}_{i = 0}^{\infty}

of

{x_{k}}_{k = 0}^{\infty}

such that

\begin{matrix} lim_{i \to \infty} f (x_{k_{i}}) = \underset{k \to \infty}{lim inf} f (x_{k}) = f^{*} . \end{matrix}

Since

{x_{k_{i}}}_{i = 0}^{\infty}

is also bounded, there exists a subsequence

{x_{k_{i_{j}}}}_{j = 0}^{\infty}

of

{x_{k_{i}}}_{i = 0}^{\infty}

such that

{lim}_{j \to \infty} x_{k_{i_{j}}} = \bar{x} \in R^{n}

. Thus, the continuity of f yields

\begin{matrix} f (\bar{x}) = lim_{j \to \infty} f (x_{k_{i_{j}}}) = lim_{i \to \infty} f (x_{k_{i}}) = f^{*} . \end{matrix}

Moreover, the closeness of X implies

\bar{x} \in X

, and hence we obtain that

\bar{x} \in X^{*} .

Since

{lim}_{k \to \infty} ∥ x_{k} - \bar{x} ∥

exists and

{lim}_{j \to \infty} ∥ x_{k_{i_{j}}} - \bar{x} ∥ = 0

, we conclude that

{lim}_{k \to \infty} ∥ x_{k} - \bar{x} ∥ = 0

. Moreover, invoking the continuity of f, we also have that

{lim}_{k \to \infty} f (x_{k}) = f^{*} .

The proof is complete. □

We end the convergence analysis of DSSM-II by investigating the finite convergence of the generated sequence as the following theorem.

Theorem 5.

Suppose that Assumptions 1 and 2 hold and

B (x^{*}; δ) \subset X^{*}

for some

δ > 0 .

Let

{x_{k}}_{k = 0}^{\infty}

be a sequence generated by DSSM-II. If one of the following statements holds:

(i): $α_{k} = α \in (0, \frac{2 δ}{2 τ + 3})$ for all $k \in N_{0};$
(ii): ${α_{k}}_{k = 0}^{\infty}$ is nonincreasing with ${lim}_{k \to \infty} α_{k} = 0$ and $\sum_{k = 0}^{\infty} α_{k} = \infty$ ,

then there exists

K \in N_{0}

such that

x_{K} \in X^{*}

.

Proof.

We suppose by contradiction that

x_{k} \notin X^{*}

for all

k \in N_{0}

. This implies

x_{k - τ_{k}} \notin X^{*}

for all

k \in N_{0}

. Note that

∥ x^{*} + δ g_{k - τ_{k}} - x^{*} ∥ \leq δ

, which implies that

x^{*} + δ g_{k - τ_{k}} \in B (x^{*}; δ) \subset X^{*} .

Since

x_{k - τ_{k}} \notin X^{*}

, we have

f (x^{*} + δ g_{k - τ_{k}}) < f (x_{k - τ_{k}}) .

Thus, by the definition of the star subgradient of f at

x_{k - τ_{k}}

, we obtain that

\begin{matrix} 〈 g_{k - τ_{k}}, x^{*} + δ g_{k - τ_{k}} - x_{k - τ_{k}} 〉 \leq 0, \end{matrix}

which yields

\begin{matrix} 〈 g_{k - τ_{k}}, x_{k - τ_{k}} - x^{*} 〉 \geq δ . \end{matrix}

(16)

On the other hand, since

x_{k} \notin X^{*},

we have from the definition of DSSM-II that

x_{k + 1} = P_{X} (x_{k} - α_{k} g_{k - τ_{k}})

, and the nonexpansive property of

P_{X}

implies that

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} 〈 g_{k - τ_{k}}, x_{k - τ_{k}} - x^{*} 〉, \end{matrix}

which combining (16), leads to

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ x_{k} - x^{*} ∥^{2} + α_{k}^{2} + 2 α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 α_{k} δ . \end{matrix}

(17)

Let N be a fixed non-negative integer. By summing up (17) from

k = 0

to

k = N

, we have

\begin{matrix} 0 \leq ∥ x_{0} - x^{*} ∥^{2} + \sum_{k = 0}^{N} α_{k}^{2} + 2 \sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ - 2 δ \sum_{k = 0}^{N} α_{k}, \end{matrix}

which implies

\begin{matrix} δ \leq \frac{∥ x_{0} - x^{*} ∥^{2}}{2 \sum_{k = 0}^{N} α_{k}} + \frac{\sum_{k = 0}^{N} α_{k}^{2}}{2 \sum_{k = 0}^{N} α_{k}} + \frac{\sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥}{\sum_{k = 0}^{N} α_{k}} . \end{matrix}

(18)

Based on the conditions (i) and (ii), we will divide our consideration into two cases.

Case 1: Assume that

α_{k} = α \in (0, \frac{2 δ}{2 τ + 3})

for all

k \in N_{0} .

Let us note from (15) that

\begin{matrix} \sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥ \leq \sum_{i = 0}^{τ} \sum_{k = 0}^{N} α_{k}^{2} = (τ + 1) (N + 1) α^{2} . \end{matrix}

Thus, we have

\begin{matrix} lim_{N \to \infty} (\frac{∥ x_{0} - x^{*} ∥^{2}}{2 \sum_{k = 0}^{N} α_{k}} + \frac{\sum_{k = 0}^{N} α_{k}^{2}}{2 \sum_{k = 0}^{N} α_{k}} + \frac{\sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥}{\sum_{k = 0}^{N} α_{k}}) \\ \leq & lim_{N \to \infty} (\frac{∥ x_{0} - x^{*} ∥^{2}}{2 (N + 1) α} + \frac{α}{2} + (τ + 1) α) \\ = & \frac{(2 τ + 3) α}{2} < δ, \end{matrix}

which is a contradiction to the inequality (18).

Case 2: Assume that

{α_{k}}_{k = 0}^{\infty}

is a nonincreasing sequence with

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

. We obtain from Theorem 3 that

{lim}_{k \to \infty} ∥ x_{k} - x_{k - τ_{k}} ∥ = 0

. Thus, we finally obtain from the fact that

{lim}_{k \to \infty} α_{k} = 0

and

\sum_{k = 0}^{\infty} α_{k} = \infty

and

{lim}_{k \to \infty} ∥ x_{k} - x_{k - τ_{k}} ∥ = 0

with Lemma 1 that

\begin{matrix} lim_{N \to \infty} (\frac{∥ x_{0} - x^{*} ∥^{2}}{2 \sum_{k = 0}^{N} α_{k}} + \frac{\sum_{k = 0}^{N} α_{k}^{2}}{2 \sum_{k = 0}^{N} α_{k}} + \frac{\sum_{k = 0}^{N} α_{k} ∥ x_{k} - x_{k - τ_{k}} ∥}{\sum_{n = 0}^{N} α_{k}}) = 0, \end{matrix}

which also contradicts to the inequality (18). □

4. Numerical Experiments

In this section, we present experimental results when we apply DSSM-I to solve the Cobb–Douglas production efficiency problem. We performed all experiments on MATLAB (R2023b) on a MacBook Air 13.3-inch with Apple M1 chip processor and 8GB memory.

The Cobb–Douglas production efficiency problem [3,6,12] aims to maximize the ratio of the total profit represented by the Cobb–Douglas product function and the total cost represented by the linear function on the product factor subject to the funding level limitations constraint. More precisely, let

a_{0}, c_{0} > 0

and let

a_{j}, b_{i j}, c_{j}, p_{i} \geq 0

for all

j = 1, 2, \dots, n

and

i = 1, 2, \dots, m,

and the Cobb–Douglas production efficiency problem is defined by

\begin{matrix} \begin{matrix} maximize & f (x) : = \frac{a_{0} \prod_{j = 1}^{n} x_{j}^{a_{j}}}{\sum_{j = 1}^{n} c_{j} x_{j} + c_{0}} \\ subject to & \sum_{j = 1}^{n} b_{i j} x_{j} \geq p_{i}, i = 1, 2, \dots, m \\ x_{j} \geq 0, j = 1, 2, \dots, n, \end{matrix} \end{matrix}

(19)

where

\sum_{j = 1}^{n} a_{j} = 1 .

Here,

x : = (x_{1}, x_{2}, \dots, x_{n}) \in R^{n}

when

x_{j}

is the j-th product factor

(j = 1, 2, \dots, n)

, and for each

i = 1, 2, \dots, m

,

p_{i}

is the profit to be obtained from project i, and

b_{i j}

is the support of product factor j to project i to achieve

p_{i}

. According to the literature [6,12], we know that the Cobb–Douglas production efficiency problem (19) is a quasi-convex optimization problem. Note that the function f is continuous and quasi-concave [4] (Theorem 2.5.1), satisfying the Hölder condition on

R^{n}

[12] (Appendix B), and the constraint set

C : = ⋂_{i = 1}^{m} {x \in R^{n} : 〈 b_{i}, x 〉 \geq p_{i}} \cap {[0, \infty)}^{n}

is the nonempty closed convex set, where

b_{i} : = {(b_{i 1}, \dots, b_{i n})}^{T}

is a vector in

R^{n}

. Therefore, we can apply DSSM-I to solve the problem (19). However, since the exact optimal solution to problem (19) is not known, we are unable to apply DSSM-II to solve it. To ensure the boundedness of the constrained set, we set the constrained set to be the set

D : = {x \in R^{n} : 〈 b_{i}, x 〉 \geq p_{i}} \cap {[0.001, 100]}^{n}

which is a subset of C. Thus, the established convergence result from Theorem 2 guarantees that

{lim}_{k \to \infty} f (x_{k}) = f^{*} .

It is worth noting that the computational procedure of the metric projection onto the constrained set D is not an easy task since there is no explicit form of

P_{D}

in general. To deal with the situation, we utilized the classical Halpern iteration by performing the following inner loop: for arbitrary initial point

u_{l} \in R^{n}

and a sequence

{λ_{l}}_{l = 0}^{\infty} \subseteq (0, 1)

such that

{lim}_{l \to \infty} λ_{l} = 0, \sum_{l = 0}^{\infty} λ_{l} = \infty, and \sum_{l = 1}^{\infty} | λ_{l + 1} - λ_{l} | < \infty,

we compute

u_{l + 1} : = λ_{l} (x_{k} - α_{k} g_{k - τ_{k}}) + (1 - λ_{l}) P_{D_{m + 1}} P_{D_{m}} P_{D_{m - 1}} \dots P_{D_{1}} u_{l} for all l \geq 1,

where

D_{m + 1} = {[0.001, 100]}^{n}

,

D_{i} : = {x \in R^{n} : 〈 b_{i}, x 〉 \geq p_{i}}

for all

i = 1, 2, \dots, m .

Then the sequence

{u_{l}}_{l = 0}^{\infty}

converges to the unique point

P_{D} (x_{k} - α_{k} g_{k - τ_{k}})

, see [28] (Theorem 30.1) for further details. In our experiment, we put the inner initial point

u_{0}

as the vector whose all coordinates are 1 and set

λ_{l} : = \frac{1}{l + 2}

for all

l \in N_{0} .

We use the stopping criterion

\frac{∥ u_{l + 1} - u_{l} ∥}{∥ u_{l} ∥ + 1} \leq 10^{- 6}

for the inner loop.

To perform the numerical experiments, we randomly set the parameter

a_{j}, b_{i j} \in (0, 1), a_{0}, c_{0}, c_{j} \in (0, 10), and p_{i} \in (0, \frac{n}{2}) .

We choose the stepsize and time-varying delays to be

α_{k} : = \frac{1}{k + 1}

and

τ_{k} : = k mod (τ + 1)

for all

k \in N_{0},

respectively. We put the initial point

x_{0}

as the vector whose all coordinates are 1. To explain the influence of the delay bound

τ

, we perform Algorithm DSSM-I for delay bounds

τ = 0, 1, 3, 5,

and 10 for various sizes of the problem n and m. Note that DSSM-I with

τ = 0

is nothing else than the star subgradient method (SSM) proposed by Kiwiel [1]. Figure 1 illustrates the number of subgradient calculations performed by SSM and DSSM-I with the delay strategies

τ_{k} : = k mod (τ + 1)

for all

k \in N_{0}

.

The average results of the 10 random data sets are presented in Figure 2 and Figure 3 below.

It can be observed from Figure 2 that, overall, all the results show a similar pattern as they approach the (approximated) optimal value as supported by Theorem 2. For each subfigure, we notice that the result of

τ = 10

seems stable and decreases faster than other

τ

.

Next, we plot the relative errors of the objective functions values in the same setting as above in Figure 3.

It can be seen from Figure 3 that all results seem to decrease and stabilize. More precisely, the graph exhibits fluctuations that are alternating decreases and increases in small periods. These fluctuations gradually diminish and vanish to stabilize. Again, this behavior satisfies the convergence result of DSSM-I, and we notice that the results with

τ = 10

seems to decrease rapidly to 0 compared to other values of

τ

.

5. Conclusions

In this work, we proposed the so-called projected star subgradient methods with delayed subgradient updates, namely DSSM-I and DSSM-II, for solving the constrained quasi-convex optimization problem. For DSSM-I, we proved that there exists a subsequence of the generated sequence that converges to an optimal solution provided that the boundedness of the generated sequence is imposed. Furthermore, under the compactness or the coercive assumption with the diminishing step size, we proved that the distance between the sequence and the optimal solution set

X^{*}

converges to zero, resulting in the convergence of the objective function values to the optimal value. For the modified version as DSSM-II and without requiring the compactness or coercive assumptions, we proved that the generated sequence converges to an optimal solution. We also established the finite convergence provided that the interior of the optimal solution set

X^{*}

is nonempty. Finally, we applied DSSM-I to solve the Cobb–Douglas production efficiency problem. It is worth noting that the proposed methods can reduce the computational costs of star subgradients by allowing the method to retrieve stale data instead. It is noted that since we could not indicate the precise upper bound on the number of times the star subgradient could be reused, the open problem with this topic still remains. Moreover, one can notice that we require the closed-form expression of the metric projection onto the constraint set X and overcome this obstacle by solving the sub-problem to approximate

P_{X}

at each iteration. This also motivated us to consider a future direction that involves a method that does not require solving a sub-problem.

Author Contributions

Conceptualization, O.P. and N.N.; methodology, O.P. and N.N.; software, O.P.; validation, O.P. and N.N.; formal analysis, O.P. and N.N.; investigation, O.P. and N.N.; writing—original draft preparation, O.P.; writing—review and editing, O.P. and N.N.; visualization, O.P.; supervision, N.N.; project administration, N.N.; funding acquisition, N.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Fundamental Fund of Khon Kaen University. This research has received funding support from the National Science, Research, and Innovation Fund or NSRF. O. Pankoon was supported by the Development and Promotion of Science and Technology Talents Project (DPST).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

The authors are thankful to the editor and two anonymous referees for comments and remarks which improved the quality and presentation of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kiwiel, K.C. Convergence and efficiency of subgradient methods for quasiconvex minimization. Math. Program. 2001, 90, 1–25. [Google Scholar] [CrossRef]
Cambini, A.; Martein, L. Generalized Convexity and Optimization; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Bradley, S.P.; Frey, S.C. Fractional programming with homogeneous functions. Oper. Res. 1974, 22, 350–357. [Google Scholar] [CrossRef]
Stancu-Minasian, I.M. Fractional Programming; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997. [Google Scholar]
Schaible, S.; Shi, J. Fractional programming: The sum-of-ratios case. Optim. Methods Softw. 2003, 18, 219–229. [Google Scholar] [CrossRef]
Hu, Y.; Yang, X.; Sim, C.-K. Inexact subgradient methods for quasi-convex optimization problems. Eur. J. Oper. Res. 2015, 240, 315–327. [Google Scholar] [CrossRef]
Hu, Y.; Yu, C.K.W.; Li, C. Stochastic subgradient method for quasi-convex optimization problems. J. Nonlinear Convex Anal. 2016, 17, 711–724. [Google Scholar]
Hu, Y.; Yu, C.K.W.; Li, C.; Yang, X. Conditional subgradient methods for constrained quasi-convex optimization problems. J. Nonlinear Convex Anal. 2016, 17, 2143–2158. [Google Scholar]
Hu, Y.; Yu, C.K.W.; Yang, X. Incremental quasi-subgradient methods for minimizing the sum of quasi-convex functions. J. Glob. Optim. 2019, 75, 1003–1028. [Google Scholar] [CrossRef]
Konnov, I.V. On convergence properties of a subgradient method. Optim. Methods Softw. 2003, 18, 53–62. [Google Scholar] [CrossRef]
Konnov, I.V. On properties of supporting and quasi-supporting vectors. J. Math. Sci. 1994, 71, 2760–2763. [Google Scholar] [CrossRef]
Hishinuma, K.; Iiduka, H. Fixed point quasiconvex subgradient method. Eur. J. Oper. Res. 2020, 282, 428–437. [Google Scholar] [CrossRef]
Choque, J.; Lara, F.; Marcavillaca, R.T. A subgradient projection method for quasiconvex minimization. Positivity 2024, 28, 64. [Google Scholar] [CrossRef]
Zhao, X.; Köbis, M.A.; Yao, Y. A projected subgradient method for nondifferentiable quasiconvex multiobjective optimization problems. J. Optim. Theory Appl. 2021, 190, 82–107. [Google Scholar] [CrossRef]
Ermol’ev, Y.M. Methods of solution of nonlinear extremal problems. Cybern 1966, 2, 1–14. [Google Scholar] [CrossRef]
Penot, J.-P. Are generalized derivatives useful for generalized convex functions? In Generalized Convexity, Generalized Monotonicity: Recent Results; Crouzeix, J.-P., Martinez-Legaz, J.-E., Volle, M., Eds.; Springer: Boston, MA, USA, 1998; pp. 3–59. [Google Scholar]
Penot, J.-P.; Zălinescu, C. Elements of quasiconvex subdifferential calculus. J. Convex Anal. 2000, 7, 243–269. [Google Scholar]
Arjevani, Y.; Shamir, O.; Srebro, N. A tight convergence analysis for stochastic gradient descent with delayed updates. In Proceedings of the 31st International Conference on Algorithmic Learning Theory, San Diego, CA, USA, 8–11 February 2020; Kontorovich, A., Neu, G., Eds.; Proceedings of Machine Learning Research, PMLR: Cambridge, MA, USA, 2020; Volume 117, pp. 111–132. [Google Scholar]
Stich, S.U.; Karimireddy, S.P. The error-feedback framework: Better rates for sgd with delayed gradients and compressed updates. J. Mach. Learn. Res. 2020, 21, 9613–9648. [Google Scholar]
Gürbüzbalaban, M.; Ozdaglar, A.; Parrilo, P.A. On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 2017, 27, 1035–1048. [Google Scholar] [CrossRef]
Tseng, P.; Yun, S. Incrementally updated gradient methods for constrained and regularized optimization. J. Optim. Theory Appl. 2014, 160, 832–853. [Google Scholar] [CrossRef]
Vanli, N.D.; Gürbüzbalaban, M.; Ozdaglar, A. Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 2008, 28, 1282–1300. [Google Scholar] [CrossRef]
Butnariu, D.; Censor, Y.; Reich, S. Distributed asynchronous incremental subgradient methods. In Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications; Brezinski, C., Wuytack, L., Reich, S., Eds.; Elsevier Science B.V.: Amsterdam, The Netherlands, 2001; Volume 8, p. 381. [Google Scholar]
Namsak, S.; Petrot, N.; Nimana, N. A distributed proximal gradient method with time-varying delays for solving additive convex optimizations. Results Appl. Math. 2023, 18, 100370. [Google Scholar] [CrossRef]
Deng, X.; Shen, L.; Li, S.; Sun, T.; Li, D.; Tao, D. Towards understanding the generalizability of delayed stochastic gradient descent. arXiv 2023, arXiv:2308.09430. [Google Scholar] [CrossRef]
Arunrat, T.; Namsak, S.; Nimana, N. An asynchronous subgradient-proximal method for solving additive convex optimization problems. J. Appl. Math. Comput. 2023, 69, 3911–3936. [Google Scholar] [CrossRef]
Cegielski, A. Iterative Methods for Fixed Point Problems in Hilbert Spaces; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.; Springer: Cham, Switzerland, 2017. [Google Scholar]
Kiwiel, K.C. Convergence of Approximate and Incremental Subgradient Methods for Convex Optimization. SIAM J. Optim. 2004, 14, 807–840. [Google Scholar] [CrossRef]

Figure 1. Behaviors of a number of subgradint calculations for various delays bounds

τ

versus number of iterations.

Figure 1. Behaviors of a number of subgradint calculations for various delays bounds

τ

versus number of iterations.

Figure 2. Behaviors of objective function values for various delays bounds

τ

and problem sizes n and m.

Figure 2. Behaviors of objective function values for various delays bounds

τ

and problem sizes n and m.

Figure 3. Behavior of relative errors for various delay bounds

τ

and problem sizes n and m.

Figure 3. Behavior of relative errors for various delay bounds

τ

and problem sizes n and m.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pankoon, O.; Nimana, N. Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization. Algorithms 2025, 18, 469. https://doi.org/10.3390/a18080469

AMA Style

Pankoon O, Nimana N. Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization. Algorithms. 2025; 18(8):469. https://doi.org/10.3390/a18080469

Chicago/Turabian Style

Pankoon, Ontima, and Nimit Nimana. 2025. "Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization" Algorithms 18, no. 8: 469. https://doi.org/10.3390/a18080469

APA Style

Pankoon, O., & Nimana, N. (2025). Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization. Algorithms, 18(8), 469. https://doi.org/10.3390/a18080469

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Delayed Star Subgradient Methods for Constrained Nondifferentiable Quasi-Convex Optimization

Abstract

1. Introduction

2. Preliminaries

3. Algorithms and Convergence Results

3.1. Delayed Star Subgradient Method I

3.2. Delayed Star Subgradient Method II

4. Numerical Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI