On Quasi-Monotone Stochastic Variational Inequalities with Applications

Mohammad Dilshad; Ibrahim Al-Dayel; Francis O. Nwawuru; Jeremiah N. Ezeora

doi:10.3390/axioms14120912

,

and

¹

Department of Mathematics, Faculty of Science, University of Tabuk, Tabuk 71491, Saudi Arabia

²

Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), P.O. Box 65892, Riyadh 11566, Saudi Arabia

³

Analysis, Control System and Optimization Research Group (ACoSORG), Department of Mathematics, Faculty of Science, Chukwuemeka Odumegwu Ojukwu University, Uli 431124, Anambra State, Nigeria

⁴

Department of Mathematics/Statistics, Faculty of Science, University of Port Harcourt, Choba 500241, Rivers State, Nigeria

Axioms2025, 14(12), 912;https://doi.org/10.3390/axioms14120912

This article belongs to the Special Issue Mathematical Optimization, Variational Inequalities and Equilibrium Problems: Theory and Applications

Version Notes

Order Reprints

Abstract

This paper studies an efficient method for solving stochastic optimization problems formulated as stochastic variational inequalities with a quasi-monotone operator, where the cost function extends the classical monotone and pseudomonotone operators. Our proposed method iterates an adaptive stepsize that adjusts automatically without linesearch and includes a momentum term to accelerate the convergence. Each iteration requires only a single projection onto the feasible set, ensuring low computational complexity. Under standard assumptions, the algorithm achieves almost sure convergence and a proven convergence rate. Furthermore, numerical experiments demonstrate its superior performance, accuracy, stability, and efficiency compared with existing stochastic approximation schemes. We also apply the method to problems such as stochastic network bandwidth allocation, stochastic complementarity problems, and the networked stochastic Nash–Cournot game, showing its strength and practical usefulness. The obtained result is an extension of existing works in the literature.

Keywords:

quasi-monotone; stochastic optimization; bandwidth network allocation; rate of convergence; almost sure convergence

MSC:

54E70; 47H25; 90C15

1. Introduction

We consider a stochastic optimization problem of the form:

min_{x \in X} F (x) = E [f (x, ξ (ϖ))],

(1)

where x is the decision variable, X is a feasible subset of

R^{n}

,

ξ : Ω \to Ξ

is a random variable representing uncertainty,

f (x, ξ)

is a random cost function, and

E [f (x, ξ (ϖ))]

denotes its expectation. Problem (1) is known as a stochastic optimization (SO) problem.

Stochastic optimization provides a powerful framework for modeling and solving problems where randomness influences objectives or constraints. Such uncertainty is noticeable in real-world applications, including machine learning, where data are inherently noisy; finance, where markets fluctuate; engineering design, where measurements are imprecise; and energy systems, where renewable sources vary unpredictably (see, [1,2]). Unlike deterministic optimization, SO explicitly incorporates statistical variability into the modeling process, yielding solutions that are not only optimal in expectation but also robust under uncertainty. Its effectiveness lies in balancing exploration of the feasible region with exploitation of informative samples, which mitigates the risk of convergence to poor local minima in nonconvex or high-dimensional settings. Consequently, SO underpins many modern algorithms, such as stochastic gradient descent for deep learning, Monte Carlo-based optimization in engineering, and stochastic portfolio selection in finance.

In general, four classical approaches are used to solve problem (1): the stochastic gradient descent (SGD) method (see, e.g., [3]), the sample average approximation (SAA) [4], the stochastic approximation (SA) method [5], and evolutionary or Monte Carlo-based algorithms [6]. These methods have been extensively analyzed and successfully applied in diverse scientific and engineering domains. In this work, we focus on the SA framework, which is particularly effective when the expected value

E [f (x, ξ)]

is difficult or impossible to compute exactly. Our study connects SA methods to the theory of variational inequalities, providing a unified framework for optimization and equilibrium modeling under uncertainty.

The variational inequality problem (VIP), first introduced in [7], is a fundamental model that generalizes many problems in optimization, game theory, and economics. For a nonempty, closed, and convex set

X \subset R^{n}

, the variational inequality problem (VIP) is expressed as follows:

F i n d x \in X such that ⟨ F (x), y - x ⟩ \geq 0, \forall y \in X,

(2)

where F is a nonlinear operator. Optimization problems correspond to VIP with

F = \nabla g

, where g is the objective function; Nash equilibrium conditions arise when F represents the vector of marginal payoffs; and traffic or network equilibrium problems appear when F models congestion effects [8]. Over the years, extensive studies (see, e.g., [9,10,11,12]) have focused on efficient numerical schemes and theoretical properties of deterministic VIP. However, in many practical situations, the operator F cannot be exactly computed due to noise or randomness. Examples include learning systems, where gradients are estimated from samples, financial models with stochastic returns, and network systems with uncertain demand. To address such cases, the notion of the stochastic variational inequality problem (SVIP) was developed in [5].

For clarity, assume that X is a nonempty, closed, and convex subset of

R^{n}

, and let

(Ω, F, P)

denote a probability space. Consider a function

f : R^{n} \times Ξ \to R^{n}

that is measurable with respect to the random variable

ξ : Ω \to Ξ

. The SVIP is formulated as

\begin{matrix} \{\begin{matrix} find x \in X such that ⟨ F (x), y - x ⟩ \geq 0, \forall y \in X, \\ where the estimate \\ F (x) : = E [f (x, ξ (ϖ))] = \int_{Ω} f (x, ξ (ϖ)) d P (ϖ), \end{matrix} \end{matrix}

(3)

and

E

denotes the expectation over

(Ω, F, P)

. The SVIP generalizes the deterministic VIP by integrating stochastic effects into the equilibrium conditions and provides a rigorous model for decision-making under uncertainty. It encompasses diverse problems in stochastic optimization, energy markets, game theory, and transportation systems (see, e.g., [13,14]).

Related Works: Problem (3) has inspired substantial research activity. When

F (x)

admits a closed-form expression, it reduces to the deterministic VIP (2), for which numerous efficient algorithms exist (see [11]). When

F (x)

is not explicitly computable, two primary strategies are adopted: the sample average approximation (SAA) and the stochastic approximation (SA) approaches. In SAA, the expectation in (3) is replaced with an empirical average based on N independent samples:

F_{N} (x) = \frac{1}{N} \sum_{i = 1}^{N} f (x, ξ_{i}),

(4)

where

{ξ_{i}}

are independent and identically distributed (i.i.d.) realizations of

ξ

since the law of large numbers ensures that

F_{N} (x) \to F (x)

almost surely as

N \to \infty

. Many recent studies analyze convergence properties of SAA and its applications to stochastic generalized equations [15], gap-function reformulations [16], and unconstrained settings [4,17,18,19]. In contrast, the SA framework solves (3) by updating iterates using sample-based gradients in an online fashion. The classical Robbins–Monro [5] procedure forms the basis for this approach. A seminal contribution by Jiang and Xu [14] proposed the single-projection SA algorithm:

x_{k + 1} = Π_{X} (x_{k} - λ_{k} f (x_{k}, ξ_{k})), \forall k \in N_{0}, x_{0} \in X,

(5)

where

Π_{X}

denotes the Euclidean projection onto X, and the stepsize sequence

{λ_{k}}

satisfies

\sum_{k = 0}^{\infty} λ_{k} = \infty

while

\sum_{k = 0}^{\infty} λ_{k}^{2} < \infty

. Under strong monotonicity and Lipschitz continuity, they proved almost sure (a.s.) convergence to the unique solution of (3). Subsequent improvements have relaxed these assumptions or enhanced convergence properties. Yousefian et al. [20] introduced adaptive step-sizes for Cartesian SVIPs; Koshal et al. [21] proposed parallel and proximal-based methods; and [22] derived asymptotic feasibility and solution rates of

O (1 / k)

and

O (1 / \sqrt{k})

, respectively, under the monotonicity assumption. To further weaken assumptions, Yang et al. [23] developed algorithms for pseudomonotone and Lipschitz continuous operators, achieving sublinear convergence and optimal oracle complexity.

Recent research has focused on improving algorithmic efficiency through the extragradient and subgradient extragradient (SEM) methods. The extragradient method, originally due to Korpelevich [24], has been successfully extended to the stochastic setting (see, e.g., [25,26]). A typical extragradient-type update reads:

\{\begin{matrix} y_{k} = Π_{X} (x_{k} - γ_{k} f (x_{k}, ξ_{k})), \\ x_{k + 1} = Π_{X} (x_{k} - α_{k} f (y_{k}, η_{k})), \end{matrix}

(6)

where

ξ_{k}

and

η_{k}

are independent sample batches. Although effective, this approach requires two projections per iteration, which can be computationally demanding for large-scale or structured feasible sets. To reduce this cost, subgradient extragradient variants [26,27,28,29] were proposed and analyzed for both deterministic and stochastic VIP, yielding promising results under monotonicity and Lipschitz continuity assumptions.

Furthermore, recent research works have introduced self-adaptive stepsize rules and inertial terms to accelerate convergence. In particular, Wang et al. [30] eliminated the need for linesearch-based parameter tuning while maintaining convergence for pseudomonotone operators and adopted self-adaptive strategy to analysis approximate solution to SA. Liu and Qin [31] later incorporated Polyak’s inertial extrapolation [32] (see, also [33,34,35,36,37]) into stochastic extragradient frameworks, achieving almost sure convergence and improved complexity bounds, though still requiring linesearch conditions.

Motivation and Contribution: Despite these advances, most existing SA-based algorithms rely on strong or pseudomonotone assumptions, limiting their applicability to broader problem classes. Moreover, modern schemes often require multiple projections, sensitive parameter tuning, or complex linesearch procedures, which hinder scalability. Motivated by these challenges and the works in [23,27,30,31,32,38,39], we ask the following fundamental question:

Question: Can we design a robust iterative scheme for solving the SVIP (3) that combines self-adaptive step-sizes, stochastic subgradient extragradient techniques, and inertial acceleration for a quasi-monotone operator within the SA framework while ensuring almost sure and convergence rate are guarantees?

The principal objective of this study is to provide an affirmative answer to this question.

Organization of the Paper: Section 2 presents preliminary definitions and essential lemmas. Section 3 introduces the proposed algorithm and underlying assumptions. Section 4 contains the main convergence analysis and proofs. Numerical results and practical applications are presented in Section 5, followed by concluding remarks in Section 6.

2. Preliminaries

In this section, we formally state some basic terminologies that are essential in this work. For any vectors

x, y \in R^{n}, ⟨ x, y ⟩

is the standard inner product,

∥ x ∥ = \sqrt{⟨ x, y ⟩}

is the Euclidean norm. Given

ξ

a random variable and

σ -

algebra

F,

the notations

E [ξ], E [ξ | F], V (ξ)

and

V (ξ | F)

denote the expectation of

ξ

, conditional expectation of

ξ

with respect to

F

, the variance of

ξ

and the conditional variance of

ξ

with respect to

F

. For

p \geq {1, | ξ |}_{p} : \sqrt[p]{E [ξ |^{p}]}

is the

L_{p}

norm of

ξ

and

{| ξ | F |}_{p} : = \sqrt[p]{E [ξ |^{p} | F]}

is the

L_{p}

norm of

ξ

conditional to

F

. The

σ -

algebra generated by the random variable

{ξ_{k}}_{i = 1}^{k}

is denoted by

σ (ξ_{1}, \dots, ξ_{k}) .

Also,

E [. | ξ_{1}, \dots, ξ_{k}] : = E [. | σ (ξ_{1}, \dots, ξ_{k})]

. We say that a random variable

ξ

is

F

-measurable. We write

ξ ⊥ ⊥ F

to mean that

ξ

is independent of the

σ

-algebra

F

. The set of natural number is denoted by

N

. Let

x \in R^{n},

there exists a unique element

z \in X,

denoted by

Π_{X} (x),

such that

∥ z - x ∥ = {inf}_{y \in X} ∥ y - z ∥ .

The mapping

Π_{X} : R^{n} \to X

is called a projection from

R^{n}

onto

X .

To quantify the inaccuracy in the stochastic evaluation of (3), introduce the error term

ε (x, ξ) : = f (x, ξ) - F (x), x \in R^{n}, ξ \in Ξ .

For any exponent

p \in [2, \infty)

, we associate with this error the p-moment function

σ_{p} (x) = {({E ∥ ε (x, ξ) ∥}^{p})}^{1 / p},

which serves as an indicator of how accurately a stochastic approximation method captures the underlying operator.

We give below a fundamental definition regarding the cost function.

Definition 1.

The mapping T on X is called:

(i): strongly monotone if, there exists a constant $μ > 0$ such that for all

$x, y \in X, ⟨ T x - T y, x - y ⟩ \geq {μ ∥ x - y ∥}^{2};$
(ii): monotone if, for all $x, y \in X, ⟨ T x - T y, x - y ⟩ \geq 0;$
(iii): pseudomonotone if, for all $x, y \in X, ⟨ T y, x - y ⟩ \geq 0 \Rightarrow ⟨ T x, x - y ⟩ \geq 0;$
(iv): quasi-monotone if, for all $x, y \in X, ⟨ T y, x - y ⟩ > 0 \Rightarrow ⟨ T x, x - y ⟩ \geq 0 .$

Remark 1.

We obtain from Definition 1 that

(1) \Rightarrow (2) \Rightarrow (3) \Rightarrow (4)

, but the converse of these statements is not true in general.

The following Lemmas are very important in our work.

Lemma 1

(Lemma 2.1, [30]). Let

Π_{X}

be a projection from

R^{n}

onto

X .

Then,

(i): $∥ Π_{X} {(x) - z ∥}^{2} \leq {∥ x - z ∥}^{2} - {∥ Π_{X} (x) - x ∥}^{2}, \forall x \in R^{n}$ and $z \in X .$
(ii): $2 ⟨ v, u - z ⟩ \leq {∥ y - z ∥}^{2} {- ∥ u - z ∥}^{2} - {∥ u - y ∥}^{2} .$
(iii): $z = Π_{X} (x)$ if and only if $⟨ x - z, z - y ⟩ \geq 0, \forall y \in X .$
(iv): Let $Γ \neq \emptyset .$ Then, $x^{*} \in Γ \Leftrightarrow x^{*} = Π_{X} (x^{*} - α F (x^{*}))$ for all $α$ strictly positive.

Let

\bar{x}, \begin{matrix} v \end{matrix} \in R^{n}

with

v \neq 0

and consider

T_{k} = {x \in R^{n} : ⟨ v,, x - \bar{x} ⟩ \leq 0} .

For any

y \in R^{n},

the projection

Π_{T_{k}} (x)

is given by

Π_{T_{k}} (y) = y - max \{0, \frac{⟨ v, y - \bar{x} ⟩}{{| v |}^{2}}\} v .

(7)

Observe that (7) provides a direct formula for computing the projection of an arbitrary point onto a half-space.

Lemma 2

([40]). For any point

x \in R^{n}

and any parameter

β > 0,

the following bounds are satisfied:

min {1, β} r_{1} (x) \leq r_{β} (x) \leq max {1, β} r_{1} (x)

where

r_{β} (x) = ∥ x - Π_{X} (x - β F (x)) ∥ .

Assumption 1

The following assumptions shall be considered:

(A): The solution set $Γ \neq \emptyset .$
(B): (i) For all $x, y \in R^{n}$ and almost every $ϖ \in Γ,$

$∥ f (x, ξ (ϖ)) - f (y, ξ (ϖ)) ∥ \leq L (ξ (ϖ)) ∥ x - y ∥,$

where $L : Ξ \to R^{+}$ is a measurable function such that $L (ξ (ϖ)) \geq 1$ for almost $ϖ \in Γ .$
(ii) There exists $a \in R^{n}$ and $p \geq 2$ such that $E [∥ f (a, ξ) ∥^{p}] < \infty$ and $E [L {(ξ)}^{p}] < \infty .$
(C): The mapping F is quasi-monotone.
(D): Let $η_{k} \in (0, 1)$ and ${δ_{k}}$ a positive sequence such that ${lim}_{k \to \infty} η_{k} = 0$ and $\sum_{k = 0}^{\infty} η_{k} = + \infty .$ Furthermore, $\frac{δ_{k}}{η_{k}} \to 0$ as $k \to \infty .$

Lemma 3

([41]). Under Assumption 1, the operators F and

σ_{q} (\cdot)

are Lipschitz continuous on

R^{n}

with constants L and

L_{q}

, respectively. This holds for every

q \in [p, 2 p]

, where p is the exponent specified in Assumption 1. Moreover, the constants satisfy

L = E [L (ξ)]

and

L_{q} = \sqrt[q]{E [L {(ξ)}^{q}]} + L .

Let

ξ = {ξ_{j}}_{j = 1}^{N}

denote an i.i.d. collection drawn from Ξ, and define

\begin{matrix} G (x, ξ) = \frac{1}{N} \sum_{j = 1}^{N} f (x, ξ_{j}) a n d \bar{ε} (x, ξ) = \frac{1}{N} \sum_{j = 1}^{N} ε (x, ξ_{j}) . \end{matrix}

Lemma 4

([41]). Suppose that Assumption 1 holds. Then, for any

q \in [p, 2 p]

with p from Assumption 1, there exists a constant

C_{q} > 0

such that for any

x \in R^{n}, x^{*} \in Γ

,

\begin{matrix} | \bar{ε} {(x, ξ) |}_{q} \leq C_{q} \frac{σ_{q} (x^{*}) + L_{q} ∥ x - x^{*} ∥}{\sqrt{N}} . \end{matrix}

Lemma 5

([41]). Assume that Assumption 1 holds and

Γ \neq \emptyset .

Let

λ_{N} : Ξ \to [0, λ]

be a random variable for some

0 < λ \leq 1 .

Define

z (x, λ_{N}, ξ) = Π_{X} (x - λ_{N} G (x, ξ)) .

Then, for any

p \geq 2,

there exist positive constants

{c_{i}}_{i = 1}^{4}

(depending on

n, p, a n d λ

) such that

\begin{matrix} | \bar{ε} (z (x, λ_{N}, ξ), ξ) |_{p} \geq \frac{c_{1} σ_{2 p} (x^{*}) + {\bar{L}}_{2 p} ∥ x - x^{*} ∥}{\sqrt{N}}, \forall x \in R^{n}, \forall x^{*} \in Γ, \end{matrix}

(8)

where

{\bar{L}}_{2 p} = c_{2} L_{2} + c_{3} L_{p} + c_{4} L_{2 p}, c_{1} > C_{p}

and

σ_{2 p} (x^{*}) \geq σ_{p} (x^{*}) .

Lemma 6

([42]). Let

{V_{k}}_{k \geq 1}, {δ_{k}}_{k \geq 1}, {η_{k}}_{k \geq 1}

and

{β_{k}}_{k \geq 1}

denote sequences of nonnegative random variables adapted to the filtration

{Θ_{k}}_{k \geq 1}

. Suppose that, almost surely,

\sum_{k = 1}^{\infty} δ_{k} < \infty

and

\sum_{k = 1}^{\infty} Θ_{k} < \infty

, and that

\begin{matrix} [V_{k + 1} ∣ Θ_{k}] \leq (1 + δ_{k}) V_{k} - η_{k} + β_{k}, \forall k \in N . \end{matrix}

Then, with probability one, the sequence

{V_{k}}

converges and

\sum_{k = 1}^{\infty} η_{k} < \infty .

3. Proposed Algorithm

Remark 2.

We highlight the benefits Algorithm 1 as follow:

1.: In the Step 1 of the proposed algorithm, we incorporate the inertial term $θ_{k} (x_{k} - x_{k - 1}),$ also called momentum-based method inspired by Nesterov acceleration and Polyak’s heavy-ball method. It promises faster convergence rates, variance reduction, better stability for ill-conditioned problems, and improved practical performance. It is obvious that without inertia, SA can become stuck in flat regions or plateaus caused by noise. In stochastic games, inertial SA often requires fewer iterations to achieve a desired accuracy. These facts underscore the need for adopting it in the algorithm, and it is an improvement over [5,21,23,25,27,28,30,39,41].
2.: The algorithm involves the subgradient extragradient (SEG) method, which involves one projection onto the feasible set. It handles non-smooth problems, improves feasibility maintenance, which ensures all iterates remain feasible, a key requirement in constrained stochastic optimization problems. It is, therefore, preferable to those algorithms that involve two projections onto the feasible set per iteration. Hence, it contributes positively to the literature when compared with works in [5,14,15,20,21,22,25,27,33,41,42].
3.: Since the SA-based algorithm is very sensitive to the stepsize or the step-length, we consider a self-adaptive stepsize that adjusts dynamically, ensuring robustness across problem scales and conditions. In fact, self-adaptive stepsize in SA accelerates convergence, reduces sensitivity to noise, eliminates heavy manual turning, ensures stability, and equally improves efficiency near the solution. Unlike Armijo linesearch methods that consume a large amount of time, thereby affecting the performance of iterative algorithms (see, e.g., [4,13,14,15,17,21,31,33,40,41,42,42]) and cited references contained therein.
4.: It is known that real-world stochastic systems often have non-symmetric or partially monotone structures. This necessitated the very essence of considering a quasi-monotone operator, which is weaker, so that the proposed scheme can handle nonlinear, asymmetric, or discontinuous mappings more realistically. It is important to note that quasi-monotone operators avoid the need for projection correctness or strong-regularization techniques required for non-monotone problems. To this end, Algorithm 1 offers greater modeling flexibility, wider applicability, reduced assumptions for convergence, and lower computational cost compared to strict monotonicity, monotonicity, and pseudo-monotonicity commonly found in the literature. Therefore, our scheme improves many already announced results in this research direction.

Algorithm 1 Inertial Self-Adaptive Subgradient Extragradient Algorithm

Step 0: Select

x_{0}, x_{1} \in R^{n}, α_{0} > 0 .

Take

θ \in [0, 1), ε_{0} \in [0, 1), \forall k \in N_{0}, μ \in (0, \frac{1}{2}) .

Take the sample rate

{N_{k}}_{k \geq 0}

with

\sum_{k = 0}^{\infty} \frac{1}{\sqrt{N_{k}}} < \infty .

Set

k = 0 .

Step 1: Given the current iterates

x_{k}, x_{k - 1}, (k \geq 0),

construct the inertial term as follows:

w_{k} = x_{k} + θ_{k} (x_{k} - x_{k - 1}),

(9)

where

θ_{k} = \{\begin{matrix} min {\frac{δ_{k}}{∥ x_{k} - x_{k - 1} ∥}, θ} i f x_{k} \neq x_{k - 1}, \\ θ, o t h e r w i s e . \end{matrix}

Step 2: Draw an i.i.d. sample

ξ_{k} = {ξ_{j}^{k}}_{j = 1}^{N_{k}}

from

Ξ

and compute

y_{k} = Π_{X} (w_{k} - α_{k} G (w_{k}, ξ_{k})),

(10)

where

G (w_{k}, ξ_{k}) = \frac{1}{N_{k}} \sum_{j = 1}^{N_{k}} f (w_{k}, ξ_{j}^{k}) .

If

w_{k} = x_{k} = y_{k},

then stop,

w_{k} \in Γ;

otherwise, go to the next step.
Step 3: Consider a constructible set

T_{k} = {x \in R^{n} : ⟨ w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k}, x - y_{k} ⟩ \leq 0}

and calculate

x_{k + 1} = Π_{T_{k}} (w_{k} - α_{k} G (y_{k}, ξ_{k})),

(11)

where

G (y_{k}, ξ_{k}) = \frac{1}{N_{k}} \sum_{j = 1}^{N_{k}} f (y_{k}, ξ_{j}^{k}),

α_{k + 1} = \{\begin{matrix} min {α_{k}, \frac{μ ∥ y_{k} - w_{k} ∥}{∥ G (w_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ∥}}, i f G (w_{k}, ξ_{k}) \neq G (y_{k}, ξ_{k}), \\ α_{k}, otherwise . \end{matrix}

(12)

Set

k : = k + 1

and go back to Step 1.

4. Convergence Analysis

In this section, we present the technical proofs for two convergence analyses of the proposed Algorithm 1: almost sure convergence and the rate of convergence. The former establishes pathwise convergence without quantifying the speed of approach, whereas the latter measures the convergence speed rather than probabilistic pathwise certainty. We begin with the proof of almost surely convergence.

4.1. Almost Surely Convergence

Remark 3. 1. Our investigation of the proposed method will be based on filtration

F_{k}

.

F_{0} = σ (x_{0}) a n d F (x_{0}, ξ_{0}, \dots, ξ_{k - 1}) \forall k \in N .

So,

F_{0} = σ (x_{0}), F_{1} = σ (x_{0}, ξ_{0}), F_{2} = σ (x_{0}, ξ_{0}, ξ_{1}), \dots

Increasingly,

F_{k} = σ (x_{0}, ξ_{0}, \dots, ξ_{k - 1})

and

F_{k + 1} = σ (x_{0}, ξ_{0}, \dots, ξ_{k}) .

We see clearly that adding

ξ_{k}

provides more information, so

F_{k} \subset F_{k + 1} .

Since

x_{k}

is a measurable function of

(x_{0}, ξ_{0}, \dots, ξ_{k - 1}) .

It follows that

x_{k} \subset F_{k} .

2.: From the Algorithm 1, and (7) it follows that for any given $x \in R^{n},$

$\begin{matrix} x_{k + 1} = x - max \{0, \frac{⟨ w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k}, x - y_{k} ⟩}{∥ w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k} ∥^{2}}\} (w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k}), \end{matrix}$

provided that $w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k} \neq 0 .$ This is one of striking advantage of Algorithm 1. Its execution is merely a single projection onto $X,$ creating efficiency and smooth running of the scheme.
3.: Let $X \subset R^{n} .$ From Algorithm 1, Step 2, we know that $y_{k} = Π_{X} (w_{k} - α_{k} G (w_{k}, ξ_{k}))$ i.e., $y_{k}$ is the projection of $(w_{k} - α_{k} G (w_{k}, ξ_{k})$ onto $X .$ Using projection property, we understand that

$\begin{matrix} ⟨ z - y_{k}, x - y_{k} ⟩ \leq 0, \forall x \in X, \end{matrix}$

where z is the point being projected, i.e., $z = (w_{k} - α_{k} G (w_{k}, ξ_{k}) .$ Using the definition of $T_{k}$ from the algorithm, we understand that for all

$x \in X, T_{k} = {x \in R^{n} : ⟨ w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k}, x - y_{k} ⟩ \leq 0} .$

So, $X \subset T_{k} .$
4.: In view of (5), we define $ε_{1}^{k} = G (w_{k}, ξ_{k}) - F (w_{k})$ and $ε_{2}^{k} = G (y_{k}, ξ_{k}) - F (y_{k})$ the oracle errors for all $k \in N .$ If $w_{k} = x_{k} = y_{k}$ for some $k \in N_{0},$ then $w_{k} \in Γ .$
Indeed, assume that $w_{k} = x_{k} = y_{k}$ for some positive $k .$ We know from Lemma 1 (i) that

$\begin{matrix} ∥ y_{k} {- x ∥}^{2} & = & ∥ Π_{X} (w_{k} - α_{k} G (w_{k}, ξ_{k})) {- x ∥}^{2} \\ \leq & ∥ w_{k} - α_{k} G (w_{k}, ξ_{k}) {- x ∥}^{2} - {∥ w_{k} - α_{k} G (w_{k}, ξ_{k}) - y_{k} ∥}^{2} \\ = & ∥ w_{k} - α_{k} G (w_{k}, ξ_{k}) {- x ∥}^{2} - α_{k}^{2} {∥ G (w_{k}, ξ_{k}) ∥}^{2} \\ = & ∥ y_{k} {- x ∥}^{2} - 2 α_{k} ⟨ w_{k} - x, G (w_{k}, ξ_{k}) ⟩ \\ = & ∥ y_{k} {- x ∥}^{2} - 2 α_{k} ⟨ w_{k} - x, F (w_{k}) ⟩ - 2 α_{k} ⟨ w_{k} - x, ε_{1}^{k} ⟩ . \end{matrix}$

(13)

Noting that $α_{k} > 0,$ we quickly have that

$\begin{matrix} ⟨ w_{k} - x, F (w_{k}) ⟩ + ⟨ w_{k} - x, ε_{1}^{k} ⟩ \leq 0, \forall x \in X . \end{matrix}$

Indeed, for all $x \in X, {λ_{k} ⟨ ε_{1}^{k}, w_{k} - x ⟩, F_{k}}$ defines a martingale difference, i.e., $E [λ_{k} ⟨ ε_{1}^{k}, w_{k} - x ⟩ | F_{k}] = 0 .$ This follows from the fact that since $w_{k} \in F_{k}$ and $ξ_{k} ⊥ ⊥ F_{k},$ we understand that

$\begin{matrix} E [ε_{1}^{k} | F_{k}] & = & E [G (w_{k}, ξ_{k}) | F_{k}] - F (w_{k}) \\ = & E [G (x, ξ_{k})] |_{x = w_{k}} - F (w_{k}) \\ = & 0 . \end{matrix}$

(14)

Taking $E [. | F_{k}],$ in (13)

$⟨ w_{k} - x, F (w_{k}) ⟩ + E [⟨ w_{k} - x, ε_{1}^{k} ⟩ | F_{k}] = ⟨ w_{k} - x, F (w_{k}) ⟩ < 0, \forall x \in X .$

Therefore, $w_{k} \in Γ .$

We shall break our main theorem into Lemmas.

Lemma 7.

The limit of

{α_{k}}

a.s. exists. Let

{lim}_{k \to \infty} α_{k} = α

as

k \to \infty .

Then,

α \geq min {α_{0}, \frac{μ}{L}},

where

L = E [L (ξ)] .

Proof.

By the definition of

λ_{k + 1}

in the algorithm, we obtain

λ_{k + 1} = min \{λ_{k}, \frac{μ ∥ y_{k} - x_{k} ∥}{∥ G (x_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ∥}\} \leq λ_{k},

(15)

which shows that the sequence

{λ_{k}}

is monotone nonincreasing, and moreover

λ_{k} > 0

for every k.

Hence

{λ_{k}}

is bounded below by 0 and thus convergent almost surely to a finite limit

λ : = {lim}_{k \to \infty} λ_{k} \geq 0

.

Now, define

A_{k} : = \frac{1}{N_{k}} \sum_{j = 1}^{N_{k}} L (ξ_{j}^{k}),

where

L (ξ)

is the random Lipschitz modulus from Assumption 1 and

{ξ_{j}^{k}}_{j = 1}^{N_{k}}

are the sampled random variables at iteration k.

The Lipschitz-type bound implies that

∥ G (x_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ∥ \leq A_{k} ∥ x_{k} - y_{k} ∥ .

If

G (x_{k}, ξ_{k}) \neq G (y_{k}, ξ_{k})

, then

\frac{μ ∥ y_{k} - x_{k} ∥}{∥ G (x_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ∥} \geq \frac{μ}{A_{k}} .

(16)

Thus,

λ_{k + 1} \geq min {λ_{k}, \frac{μ}{A_{k}}} .

Iterating this inequality gives

λ_{k + 1} \geq min \{λ_{0}, \frac{μ}{A_{0}}, \frac{μ}{A_{1}}, \dots, \frac{μ}{A_{k}}\} .

(17)

Assume, for contradiction, that

λ < min {λ_{0}, \frac{μ}{L}} .

Choose

η

such that

λ < η < min {λ_{0}, \frac{μ}{L}} .

Since

λ_{k} \to λ

almost surely as

k \to \infty,

there exists (a.s.)

k_{0}

such that for all

k > k_{0}

,

λ_{k + 1} < η

. By definition of

λ_{k + 1}

, whenever

λ_{k + 1} < η

we must have

\frac{μ}{A_{k}} < η ⟹ A_{k} > \frac{μ}{η} .

Hence, for all large k,

E [A_{k}] > \frac{μ}{η} .

But

E [A_{k}] = E [\frac{1}{N_{k}} \sum_{j = 1}^{N_{k}} L (ξ_{j}^{k})] = E [L (ξ)] = L,

since the

ξ_{j}^{k}

are i.i.d. Therefore

L > \frac{μ}{η}

, implying

η > \frac{μ}{L}

, which contradicts

η < \frac{μ}{L}

.

Thus, the assumption is false, and we conclude that

λ \geq min {λ_{0}, \frac{μ}{L}} .

□

Remark 4.

The expectation step is justified because

A_{k}

is the empirical mean of i.i.d. random variables

L (ξ_{j}^{k})

, so that

E [A_{k}] = E [L (ξ)] = L

. The contradiction argument uses that if

A_{k} > μ / η

for all large k, then

L = E [A_{k}] \geq \frac{μ}{η}

, contradicting

η < μ / L

. This gives a transparent probabilistic justification for the bound.

The following Lemma will be needed in the sequel.

Lemma 8.

For any

x^{*} \in Γ

, the following estimate holds almost surely:

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ_{k}) [∥ x_{k + 1} {- y k ∥}^{2} + {∥ y_{k} - w_{k} ∥}^{2}] \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ + 3 M_{2} η_{k}, \forall k \in N_{0}, \end{matrix}

(18)

where

τ_{k} = \frac{2 μ α_{k}}{α_{k + 1}} .

Proof.

Recall from the inertial term step that

w_{k} = x_{k} + θ_{k} (x_{k} - x_{k - 1})

and for any

x^{*} \in Γ,

using definition of

{w_{k}}

and

{θ_{k}},

we obtain

θ_{k} ∥ x_{k} - x_{k - 1} ∥ \leq δ_{k}, \forall n \geq 1 .

Hence,

\frac{θ_{k}}{η_{k}} ∥ x_{k} - x_{k - 1} ∥ \leq \frac{δ_{k}}{η_{k}} \to 0 a s n \to \infty .

Therefore, there exists

M_{1} > 0

such that

\frac{θ_{k}}{η_{k}} ∥ x_{k} - x_{k - 1} ∥ \leq M_{1}, k \geq 1 .

\begin{matrix} ∥ w_{k} - x^{*} ∥^{2} & = & ∥ x_{k} - x^{*} + θ_{k} (x_{k} - x_{k - 1}) ∥^{2} \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + θ_{k}^{2} ∥ x_{k} - x_{k - 1} ∥^{2} + 2 θ_{k} ∥ x_{k} - x^{*} ∥ ∥ x_{k} - x_{k - 1} ∥ \\ = & ∥ x_{k} - x^{*} ∥^{2} + θ_{k} ∥ (x_{k} - x_{k - 1}) ∥ [2 ∥ x_{k} - x^{*} ∥ + θ_{k} ∥ x_{k} - x_{k - 1} ∥] \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + 3 θ_{k} ∥ x_{k} - x_{k - 1} ∥ M_{0} \\ = & ∥ x_{k} - x^{*} ∥^{2} + 3 M_{0} η_{k} \frac{θ_{k}}{η_{k}} ∥ x_{k} - x_{k - 1} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + 3 M_{1} M_{0} η_{k} \\ = & ∥ x_{k} - x^{*} ∥^{2} + 3 M_{2} η_{k}, \end{matrix}

(19)

for some

0 < M_{2} = M_{0} M_{1} .

In view of the fact that

Γ \subset X \subset T_{k}

holds for all

k \in N_{0},

and for each

x^{*} \in Γ

with Lemma 1 (ii), (20) and for any

w_{k} \in X,

we understand that

2 α_{k} ⟨ G (y_{k}, ξ_{k}), x_{k + 1} - x^{*} ⟩ \leq ∥ w_{k} - x^{*} ∥^{2} - ∥ x_{k + 1} - x^{*} ∥^{2} - {∥ x_{k + 1} - w_{k} ∥}^{2} .

That is,

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ w_{k} - x^{*} ∥^{2} - {∥ x_{k + 1} - w_{k} ∥}^{2} + 2 α_{k} ⟨ G (y_{k}, ξ_{k}), x^{*} - x_{k + 1} ⟩ \\ = & ∥ w_{k} - x^{*} ∥^{2} - {∥ x_{k + 1} - y_{k} + y_{k} - w_{k} ∥}^{2} + 2 α_{k} ⟨ G (y_{k}, ξ_{k}), x^{*} - x_{k + 1} ⟩ \\ = & ∥ w_{k} - x^{*} ∥^{2} - [∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2} + + 2 ⟨ x_{k + 1} - y_{k}, y_{k} - w_{k} ⟩] \\ + 2 α_{k} ⟨ G (y_{k}, ξ_{k}), x^{*} - x_{k + 1} ⟩ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (∥ x_{k + 1} - w_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{k} ⟨ G (y_{k}, ξ_{k}), x^{*} - y_{k} ⟩ \\ + 2 ⟨ x_{k + 1} - y_{k}, w_{k} - y_{k} - α_{k} G (y_{k}, ξ_{k}) ⟩ + 3 M_{2} η_{k} . \end{matrix}

(20)

Since

x_{k + 1}

lies in

T_{k},

it follows that

\begin{matrix} ⟨ x_{k + 1} - y_{k}, w_{k} - y_{k} - α_{k} G (w_{k}, ξ_{k}) ⟩ \leq 0, \end{matrix}

which further implies that

\begin{matrix} ⟨ x_{k + 1} - y_{k}, w_{k} - y_{k} - α_{k} G (y_{k}, ξ_{k}) ⟩ \leq α_{k} ⟨ x_{k + 1} - y_{k}, G (w_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ⟩, \forall k \in N_{0} . \end{matrix}

(21)

Indeed, we know that

w_{k} - y_{k} - α_{k} G (w_{k}, ξ_{k}) = (w_{k} - y_{k} - α_{k}) G (y_{k}, ξ_{k}) - α_{k} (G (w_{k}, ξ_{k}) - G (y_{k}, ξ_{k})) .

Taking inner product with

x_{k + 1} - y_{k},

using linearity gives

\begin{matrix} 0 & \geq & ⟨ x_{k + 1} - y_{k}, w_{k} - y_{k} - α_{k} G (w_{k}, ξ_{k}) ⟩ \\ = & ⟨ x_{k + 1} - y_{k}, w_{k} - y_{k} - α_{k} G (w_{k}, ξ_{k}) ⟩ - α_{k} ⟨ x_{k + 1} - y_{k}, G (w_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ⟩ . \end{matrix}

Re-arranging the above inequality, we obtain (21). Furthermore, applying Cauchy–Schwartz inequality, utilizing the definition of

α_{k + 1}

given in Algorithm 1, one obtains

\begin{matrix} 2 α_{k} ⟨ x_{k + 1} - y_{k}, G (w_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ⟩ & \leq & 2 α_{k} ∥ x_{k + 1} - y_{k} ∥ ∥ G (w_{k}, ξ_{k}) - G (y_{k}, ξ_{k}) ∥ \\ \leq & 2 μ \frac{α_{k}}{α_{k + 1}} ∥ x_{k + 1} - y_{k} ∥ ∥ w_{k} - y_{k} ∥ \\ \leq & τ_{k} (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ w_{k} - y_{k} ∥}^{2}) . \end{matrix}

(22)

Since

α_{k + 1} \leq α_{0},

substituting (21) into (20), and utilizing (22), we obtain

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) ∥ x_{k + 1} - y_{k} ∥^{2} - (1 - τ_{k}) {∥ y_{k} - w_{k} ∥}^{2} \\ + 2 α_{k} ⟨ G (y_{k}, ξ_{k}), x^{*} - y_{k} ⟩ \\ = & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{k} ⟨ (ε_{2}^{k} + F y_{k}), x^{*} - y_{k} ⟩ \\ = & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{k} (⟨ ε_{2}^{k}, x^{*} - y_{k} ⟩ + ⟨ F y_{k}, x^{*} - y_{k} ⟩) \\ = & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{k} ⟨ ε_{2}^{k}, x^{*} - y_{k} ⟩ - 2 α_{k} ⟨ F y_{k}, y_{k} - x^{*} ⟩ \\ \leq & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ x^{*} - y_{k} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ x^{*} - y_{k} ∥ + 3 M_{2} η_{k}, \end{matrix}

(23)

this is true since by definition of quasi-monotonicity assumption on

F (.)

we know that

⟨ F y_{k}, y_{k} - x^{*} ⟩ \geq 0 \forall y_{k} \in X

and

α_{k} > 0 \forall k \in N_{0} .

This completes the proof of Lemma 8. □

The next Lemma controls the error bound arising from our computations.

Lemma 9.

Assume that Assumption 1 holds. Then, for any

x^{*} \in Γ,

a.s. we have

\begin{matrix} | | ∥ y_{k} - x^{*} ∥ ∥ ε_{2}^{k} ∥ | F_{k} |_{\frac{p}{2}} & \leq & \frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ x_{k} - x^{*} ∥}^{2}, \forall k \in N_{0} . \end{matrix}

(24)

Proof.

Nothing that

x^{*} = Π_{X} (x^{*} - α_{k} F (x^{*})),

utilizing the nonexpansivity of

Π_{X},

the Lipschitz continuity of F and noting that

α_{k} \leq α_{0},

we obtain

\begin{matrix} ∥ y_{k} - x^{*} ∥ & = & ∥ Π_{X} (w_{k} - α_{k} G (w_{k}, ξ_{k})) - Π_{X} (x^{*} - α_{k} F (x^{*})) ∥ \\ \leq & ∥ w_{k} - x^{*} - α_{k} (G (w_{k}, ξ_{k}) - F (w_{k})) ∥ \\ = & ∥ w_{k} - x^{*} - α_{k} (F (w_{k}) + ε_{1}^{k} - F (x^{*})) ∥ \\ \leq & ∥ w_{k} - x^{*} ∥ + α_{k} (∥ F (w_{k}) - F (x^{*}) ∥ + ∥ ε_{1}^{k} ∥) \\ \leq & ∥ w_{k} - x^{*} ∥ + α_{0} L ∥ w_{k} - x^{*} ∥ + α_{0} ∥ ε_{1}^{k} ∥, \\ = & (1 + α_{0} L) ∥ w_{k} - x^{*} ∥ + α_{0} ∥ ε_{1}^{k} ∥, \begin{matrix} \forall k \in N_{0} . \end{matrix} \end{matrix}

(25)

Now, applying Lemmas 4 and 5, and (25), one obtains

\begin{matrix} | | ∥ y_{k} - x^{*} ∥ ∥ ε_{2}^{k} ∥ | F_{k} |_{\frac{p}{2}} & \leq & (1 + L α_{0}) ∥ w_{k} - x^{*} ∥ | ∥ ε_{2}^{k} ∥ |_{\frac{p}{2}} + α_{0} | ∥ ε_{1}^{k} ∥ ∥ ε_{2}^{k} ∥ | F_{k} |_{\frac{p}{2}} \\ \leq & (1 + L α_{0}) ∥ w_{k} - x^{*} ∥ | ∥ ε_{2}^{k} ∥ | F_{k} |_{\frac{p}{2}} + α_{0} | ∥ ε_{1}^{k} ∥ | F_{k} |_{p} \times | ∥ ε_{2}^{k} ∥ | F_{k} |_{p} \\ \leq & (1 + L α_{0}) ∥ w_{k} - x^{*} ∥ \frac{c_{1} σ_{2 p} (x^{*}) + {\bar{L}}_{2 p} ∥ w_{k} - x^{*} ∥}{\sqrt{N_{k}}} \\ + \frac{α_{0} (c_{1} σ_{2 p} (x^{*}) \begin{matrix} + \end{matrix} {\bar{L}}_{2 p} ∥ w_{k} - x^{*} {∥)}^{2}}{N_{k}} \\ \leq & (1 + L α_{0}) \frac{∥ w_{k} - x^{*} ∥^{2} + {(c_{1} σ_{2 p} (x^{*}) + {\bar{L}}_{2 p} ∥ w_{k} - x^{*} ∥)}^{2}}{\sqrt{N_{k}}} \\ \begin{matrix} + \frac{α_{0} {(c_{1} σ_{2 p} (x^{*}) \begin{matrix} + \end{matrix} {\bar{L}}_{2 p} ∥ w_{k} - x^{*} ∥)}^{2}}{\sqrt{N_{k}}} \end{matrix} \\ \leq & (1 + L α_{0}) \frac{∥ w_{k} - x^{*} ∥^{2} + 2 {(c_{1} σ_{2 p} (x^{*}))}^{2} + 2 {\bar{L}}_{2 p}^{2} {∥ w_{k} - x^{*} ∥}^{2}}{\sqrt{N_{k}}} \\ \begin{matrix} + \frac{2 α_{0} [{(c_{1} σ_{2 p} (x^{*}))}^{2} + {\bar{L}}_{2 p}^{2} {∥ w_{k} - x^{*} ∥}^{2}]}{\sqrt{N_{k}}} \end{matrix} \\ = & \frac{2 (1 + α_{0} (1 + L)) {(c_{1} σ_{2 p} (x^{*}))}^{2}}{\sqrt{N_{k}}} \\ + \frac{(1 + L α_{0}) (1 + 2 {\bar{L}}_{2 p}^{2}) + 2 α_{0} {\bar{L}}_{2 p}^{2}}{\sqrt{N_{k}}} {∥ w_{k} - x^{*} ∥}^{2} \\ = & \frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ w_{k} - x^{*} ∥}^{2}, \end{matrix}

(26)

where

m_{3} = 2 (1 + α_{0} (1 + L)) {(c_{1} σ_{2 p} (x^{*}))}^{2}, \begin{matrix} m_{4} = (1 + L α_{0}) (1 + 2 {\bar{L}}_{2 p}^{2}) + 2 α_{0} {\bar{L}}_{2 p}^{2} \end{matrix}

with

c_{1},

and

{\bar{L}}_{2 p}

as defined in Lemma 5.

Utilizing (19), and combining with (26), we understand that

\begin{matrix} | | ∥ y_{k} - x^{*} ∥ ∥ ε_{2}^{k} ∥ | F_{k} |_{\frac{p}{2}} & \leq & \frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ w_{k} - x^{*} ∥}^{2} \\ \leq & \frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} [∥ x_{k} - x^{*} ∥^{2} + 3 M_{2} η_{k}] \\ = & \frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{4} (3 M_{2} η_{k})}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ x_{k} - x^{*} ∥}^{2} \\ = & \frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ x_{k} - x^{*} ∥}^{2} \begin{matrix} , \end{matrix} \end{matrix}

(27)

where

m_{5} = 3 m_{4} \begin{matrix} M_{2} \end{matrix} η_{k}

and this completes the proof of Lemma 9 □

We are now ready to provide the main theorem of this paper.

Theorem 1.

Suppose that Assumption 1 holds. Then, the sequence

{x_{k}}

generated by Algorithm 1 a.s. converges to a point

x^{*} \in Γ .

Proof.

We know from Lemma 7 that the limit of

{α_{k}}

exists, noting that

{α_{k}}

is nonincreasing,

α_{k} \geq α

for all

k \in N_{0},

where

{lim}_{n \to \infty} α_{k} = α .

Utilizing the definition

{y_{k}}

given in Algorithm 1,the Oracle error,

ε_{1}^{k} = G (w_{k}, ξ_{k}) - F (w_{k})

and applying Lemmas 2 and 3, we understand that

\begin{matrix} {(min {1, α} r_{1} (w_{k}))}^{2} & \leq & {(min {1, α_{k}} r_{1} (w_{k}))}^{2} \\ \leq & {(r_{α_{k}} (w_{k}))}^{2} \\ = & ∥ w_{k} - Π_{X} (w_{k} - α_{k} F (w_{k})) ∥^{2} \\ \leq & 2 ∥ w_{k} - y_{k} ∥^{2} + 2 {∥ y_{k} - Π_{X} (w_{k} - α_{k} F (w_{k})) ∥}^{2} \\ = & 2 ∥ w_{k} - y_{k} ∥^{2} + 2 {∥ Π_{X} (w_{k} - α_{k} (F (w_{k}) + ε_{1}^{k})) - Π_{X} (w_{k} - α_{k} F (w_{k})) ∥}^{2} \\ \leq & 2 ∥ w_{k} - y_{k} ∥^{2} + α_{0}^{2} {∥ ε_{1}^{k} ∥}^{2} . \end{matrix}

It follows from the above estimate that

\begin{matrix} ∥ w_{k} - y_{k} ∥^{2} \geq \frac{\bar{α} {(r_{1} (w_{k}))}^{2}}{2} - α_{0}^{2} {∥ ε_{1}^{k} ∥}^{2}, \end{matrix}

(28)

where

\bar{α} = min {1, α} .

Utilizing Lemma 4, and setting

p = 2,

we obtain that

\begin{matrix} | ∥ ε_{1}^{k} ∥^{2} | F_{k} | & = & (| ∥ ε_{1}^{k} ∥ | F_{k} |_{2}) \\ \leq & {(\frac{c_{1} σ_{4} (x^{*}) + {\bar{L}}_{4} ∥ w_{k} - x^{*} ∥}{\sqrt{N_{k}}})}^{2} \\ \leq & \frac{2 {(c_{1} σ_{4} (x^{*}))}^{2} + 2 ({\bar{L}}_{4} ∥ w_{k} - x^{*} {∥)}^{2}}{N_{k}} \\ \leq & \frac{2 {(c_{1} σ_{4} (x^{*}))}^{2} + 2 ({\bar{L}}_{4} ∥ w_{k} - x^{*} {∥)}^{2}}{\sqrt{N_{k}}} . \end{matrix}

(29)

Recall that in Lemma 9

τ_{k} = 2 μ \frac{α_{k}}{α_{k + 1}} .

It follows from Lemma 8 that

{lim}_{n \to \infty} τ_{k} = 2 μ \in (0, 1) .

On the other hand, since

τ_{k} \geq 2 μ

for all

k \in N_{0},

we can find some

τ \in (0, 1)

and an index

K_{0} \in N

such that

τ_{k} \subset [2 μ, τ)

for every

k \geq K_{0} .

Taking these observations into account, and recalling that

1 - τ > 0,

we deduce from Lemma 8, (28), and (29) that

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ w_{k} - x^{*} ∥^{2} - \frac{(1 - τ) α^{2} {(r_{1} (w_{k}))}^{2}}{2} + α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ + (1 - τ) α_{0}^{2} {∥ ε_{1}^{k} ∥}^{2} \forall k > K_{0} . \end{matrix}

(30)

Taking

E [. | F_{k}]

in (30) and noting

p = 2,

and applying (19), we obtain

\begin{matrix} E [∥ x_{k + 1} - x^{*} ∥^{2} | F_{k}] \leq ∥ w_{k} - x^{*} ∥^{2} - \frac{(1 - τ) {\bar{α}}^{2} {(r_{1} (w_{k}))}^{2}}{2} + (1 - τ) α_{0}^{2} E [∥ ε_{1}^{k} ∥^{2} | F_{k}] \\ + α_{0} E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥] \\ \leq & ∥ w_{k} - x^{*} ∥^{2} - \frac{(1 - τ) {\bar{α}}^{2} {(r_{1} (w_{k}))}^{2}}{2} + (1 - τ) α_{0}^{2} \frac{{(2 c_{1} σ_{4} (x^{*}))}^{2} + 2 {\bar{L}}_{4}^{2} {∥ w_{k} - x^{*} ∥}^{2}}{\sqrt{N_{k}}} \\ + 2 α_{0} (\frac{m_{4}}{\sqrt{N_{k}}} {∥ w_{k} - x^{*} ∥}^{2} + \frac{m_{3}}{\sqrt{N_{k}}}) \\ = & (1 + \frac{2 α_{0} m_{4} + 2 α_{0}^{2} L_{4}^{2} (1 - τ)}{\sqrt{N_{k}}}) {∥ w_{k} - x^{*} ∥}^{2} - \frac{(1 - τ) {\bar{α}}^{2} {(r_{1} (w_{k}))}^{2}}{2} \\ + \frac{2 (1 - τ) α_{0}^{2} {(c_{1} σ_{4} (x^{*}))}^{2} + α_{0} m_{3}}{\sqrt{N_{k}}} \\ \leq & (1 + \frac{2 α_{0} m_{4} + 2 α_{0}^{2} L_{4}^{2} (1 - τ)}{\sqrt{N_{k}}}) [∥ x_{k} - x^{*} ∥^{2} + 3 M_{2} η_{k}] - \frac{(1 - τ) {\bar{α}}^{2} {(r_{1} (w_{k}))}^{2}}{2} \\ + \frac{2 (1 - τ) α_{0}^{2} {(c_{1} σ_{4} (x^{*}))}^{2} + α_{0} m_{3}}{\sqrt{N_{k}}} \\ = & (1 + \frac{2 α_{0} m_{4} + 2 α_{0}^{2} L_{4}^{2} (1 - τ)}{\sqrt{N_{k}}}) {∥ x_{k} - x^{*} ∥}^{2} + 3 M_{2} η_{k} (1 + \frac{2 α_{0} m_{3} + 2 α_{0}^{2} L_{4}^{2} (1 - τ)}{\sqrt{N_{k}}}) \\ - \frac{(1 - τ) {\bar{α}}^{2} {(r_{1} (w_{k}))}^{2}}{2} + \frac{2 (1 - τ) α_{0}^{2} {(c_{1} σ_{4} (x^{*}))}^{2} + α_{0} m_{3}}{\sqrt{N_{k}}}, \forall k \geq K . \end{matrix}

(31)

Setting

Let

V_{k} = {∥ x_{K + k} - x^{*} ∥}^{2}

and set

G_{k} = F_{K + k}

for all

k \in N,

m_{6} = 2 α_{0} m_{4} + 2 α_{0}^{2} L_{4}^{2} (1 - τ),

m_{7} = 2 (1 - τ) α_{0}^{2} {(c_{1} σ_{4} (x^{*}))}^{2} + α_{0} m_{4},

it follows from (31) that

\begin{matrix} E [V_{k + 1} | G_{k}] \leq (1 + \frac{m_{6}}{\sqrt{N_{k}}}) V_{k} - \frac{(1 - τ) {\bar{α}}^{2} {(r_{1} (w_{k}))}^{2}}{2} + 3 M_{0} a_{k} (1 + \frac{m_{6}}{\sqrt{N_{k}}}) + \frac{m_{7}}{\sqrt{N_{k}}}, \forall k \in N . \end{matrix}

(32)

Since

\sum_{k = 0}^{\infty} \frac{1}{\sqrt{N_{k}}} < \infty,

we conclude from Lemma 6 and (32) that a.s. the sequence

{V_{k}}

is convergent and

\sum_{k = 1}^{\infty} (r_{1} {(w_{k})}^{2}) < \infty .

Thus, a.s. the sequence

{x_{k}}

is bounded and

\begin{matrix} lim_{k \to \infty} (r_{1} {(w_{k})}^{2}) = lim_{k \to \infty} {∥ w_{k} - Π_{X} (w_{k} - F (w_{k})) ∥}^{2} = 0 . \end{matrix}

By virtue of a.s. boundedness of

{x_{k}},

we can find a subsequence

{x_{k_{j}}}

of

{x_{k}}

that a.s. converges to a point

x^{* *} .

We understand from this fact that

\begin{matrix} lim_{j \to \infty} (r_{1} {(w_{k_{j}})}^{2}) = lim_{j \to \infty} ∥ w_{k_{j}} - Π_{X} (w_{k_{j}} - F (w_{k_{j}})) ∥^{2} = ∥ x^{* *} - Π_{X} {(x^{* *} - F (x^{* *}) ∥}^{2} = 0, \end{matrix}

which shows that

x^{* *} \in Γ .

Note further that, since the limit of

∥ x_{k} - x^{*} ∥

exists almost surely for every

x^{*} \in Γ

, it follows that

lim_{k \to \infty} ∥ x_{k} - x^{*} ∥^{2} = lim_{j \to \infty} {∥ x_{k_{j}} - x^{* *} ∥}^{2} = 0 .

This establishes the claim and thus completes the proof of Theorem 1. □

4.2. Rate of Convergence

We provide the most insightful part of the proposed algorithm, which describes how quickly the recursive sequence

{x_{k}}

approaches its limit

x^{*} .

Moving forward from here, we consider the following Lemma, which is needed in the sequel.

Lemma 10.

Under Assumption 1, we have

\sum_{k = 0}^{\infty} E [∥ x_{k + 1} - y_{k} ∥^{2} + ∥ y_{k} - w_{k} ∥^{2}] < \infty .

Proof.

From the previous estimate, we know that, from using the uniform bound on

τ_{k}

and the previous estimates, we obtain

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ w_{k} - x^{*} ∥^{2} - (1 - τ) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ . \end{matrix}

The above information can be recast as

\begin{matrix} (1 - τ) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) & \leq & ∥ w_{k} - x^{*} ∥^{2} - ∥ x_{k + 1} - x^{*} ∥^{2} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - {∥ x_{k + 1} - x^{*} ∥}^{2} + 3 M_{2} η_{k} \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥, k \in N_{0} . \end{matrix}

(33)

Having established the a.s. boundedness of

{∥ x_{k} - x^{*} ∥^{2}}

in Theorem 1, we assign

P_{0} = {sup}_{k \in N_{0}} {∥ x_{k} - x^{*} ∥}^{2} .

Now by setting

p = 2,

utilizing Lemma 4, (27) and taking

E [.]

in (33), we obtain

\begin{matrix} (1 - τ) E [∥ x_{k + 1} - y_{k} ∥^{2} & + & ∥ y_{k} - w_{k} ∥^{2}] \leq E [∥ x_{k} - x^{*} ∥^{2}] + E 3 M_{2} [η_{k}] + 2 α_{0} E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥] \\ = & E [x_{k} - x^{*} ∥^{2}] - E [∥ y_{k + 1} - x^{*} ∥^{2}] + 2 α_{0} E [E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ | F_{k}]] \\ + 3 M_{2} E [η_{k}] \\ \leq & E [∥ x_{k} - x^{*} ∥^{2}] - E [∥ x_{k + 1} - x^{*} ∥^{2}] + 3 M_{2} E [η_{k}] \\ + 2 α_{0} E [\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ x_{k} - x^{*} ∥}^{2}] \\ = & E [∥ x_{k} - x^{*} ∥^{2}] - E [∥ x_{k + 1} - x^{*} ∥^{2}] \\ + 2 α_{0} (\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} P_{0}) + P^{'}, \end{matrix}

(34)

where

p^{'}

is the

{sup}_{k \in N_{0}} 3 M_{2} η_{k} .

Taking a sum over k recursively in (34), we obtain

\begin{matrix} (1 - τ) \sum_{k = K = 1}^{\infty} E [∥ x_{k + 1} - y_{k} ∥^{2} & + & ∥ y_{k} - w_{k} ∥^{2}] \leq E [∥ x_{K + 1} - x^{*} ∥^{2}] \\ + 2 α_{0} \sum_{k = K + 1}^{\infty} (\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} P_{0}) \\ < & P_{0} + 2 α_{0} \sum_{k = 0}^{\infty} (\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} P_{0}) + p^{'} < \infty . \end{matrix}

That is,

\begin{matrix} \sum_{k = 0}^{\infty} E [∥ x_{k + 1} - y_{k} ∥^{2} + ∥ y_{k} - w_{k} ∥^{2}] < \infty, \end{matrix}

which completes the proof of Lemma 10. □

We now state the following theorem for the rate of convergence. In this setting, the cost function satisfies a strong pseudo-monotonicity property, meaning that

⟨ F (x), y - z ⟩ \geq 0 \Rightarrow ⟨ F (y), y - x ⟩ \geq {λ ∥ x - y ∥}^{2}, \forall x, y \in X .

Theorem 2.

Assume that the Assumption 1 is satisfied. Then, the following condition holds

\begin{matrix} min_{k = 0, 1, \dots, m} E [∥ w_{k} - y_{k} ∥^{2}] \leq \frac{1}{m + 1} (∥ x_{0} - x^{*} ∥^{2} + P_{1}), \end{matrix}

where

P_{1}

is a well-defined positive constant.

Proof.

Consider

\bar{τ} = \frac{2 μ α_{0}}{α},

where

α

is given as the limit of

{α_{k}} .

Noting that a.s.

α_{k} \subset [α, α_{0}],

we obtain a.s. that

τ_{k} = \frac{2 μ α}{α_{k + 1}} \leq \bar{τ}

for all

k \in N_{0} .

It is known from Remark 3 (iv) that

w_{k} \in Γ

if

w_{k} = x_{k} = y_{k},

for some

k \in N .

From Lemma 3, we can observe that if

\begin{matrix} \sum_{k = 1}^{\infty} {∥ y_{k} - w_{k} ∥}^{2} \leq \infty, \end{matrix}

then necessarily,

\begin{matrix} lim_{k \to \infty} ∥ w_{k} - y_{k} ∥^{2} = 0, a . s . \Rightarrow lim_{k \to \infty} ∥ w_{k} - y_{k} ∥ = 0, a . s . \end{matrix}

Therefore, to establish the convergence rate of Algorithm 1, it is sufficient to analyze the rate at which the sequence

{∥ w_{k} - y_{k} ∥}

converges. Building on this observation, and applying Lemma 7 together with the previously defined quantity

\bar{τ}

, we obtain

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ \leq & ∥ w_{k} - x^{*} ∥^{2} - (1 - \bar{τ}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ . \end{matrix}

(35)

We extract from (35) that

\begin{matrix} ∥ x_{k + 1} - y_{k} ∥^{2} + ∥ y_{k} - w_{k} ∥ & \leq & ∥ w_{k} - x^{*} ∥^{2} - {∥ x_{k + 1} - x^{*} ∥}^{2} + \bar{τ} [∥ y_{k} - w_{k} ∥ + ∥ x_{k + 1} - y_{k} ∥^{2}] \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥, \forall k \in N . \end{matrix}

(36)

Applying Lemma 9, taking

E [.]

in (36) and setting

p = 2,

one obtains

\begin{matrix} E [∥ y_{k} - w_{k} ∥^{2}] & \leq & E [∥ x_{k + 1} - y_{k} ∥^{2} + ∥ y_{k} - w_{k} ∥^{2}] \\ \leq & E [∥ w_{k} - x^{*} ∥^{2}] - E [∥ x_{k + 1} - x^{*} ∥^{2}] + \bar{τ} E [∥ x_{k + 1} - y_{k} ∥^{2} \\ + ∥ y_{k} - w_{k} ∥^{2}] + 2 α_{0} E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥] \\ = & E [∥ w_{k} - x^{*} ∥^{2}] - E [∥ x_{k + 1} - x^{*} ∥^{2}] + \bar{τ} E [∥ x_{k + 1} - y_{k} ∥^{2} \\ + ∥ y_{k} - w_{k} ∥^{2}] + 2 α_{0} E [E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ | F_{k}]] \\ \leq & E [∥ x_{k} - x^{*} ∥ + 3 M_{2} η_{k}] - E [∥ x_{k + 1} - x^{*} ∥^{2}] + \bar{τ} E [∥ x_{k + 1} - y_{k} ∥^{2} \\ + ∥ y_{k} - w_{k} ∥^{2}] + 2 α_{0} (\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} {∥ x_{k} - x^{*} ∥}^{2}) . \end{matrix}

Now, taking sum from

k = 0

to m in (37), we obtain

\begin{matrix} \sum_{k = 0}^{m} E [∥ y_{k} - w_{k} ∥^{2}] & \leq & ∥ x_{0} - x^{*} ∥^{2} - E [∥ x_{m + 1} - x^{*} ∥^{2}] + \bar{τ} \sum_{k = 0}^{m} E [∥ x_{k + 1} - y_{k} ∥^{2} + ∥ y_{k} - w_{k} ∥^{2}] \\ + 2 α_{0} \sum_{k = 0}^{m} (\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} P_{0}) + 3 M_{2} \sum_{k = 0}^{m} E [η_{k}] \\ \leq & ∥ x_{0} - x^{*} ∥^{2} + Q, \end{matrix}

(37)

\begin{matrix} Q = 2 α_{0} \sum_{k = 0}^{m} (\frac{m_{3}}{\sqrt{N_{k}}} + \frac{m_{5}}{\sqrt{N_{k}}} + \frac{m_{4}}{\sqrt{N_{k}}} P_{0}) + \bar{τ} \sum_{k = 0}^{m} E [∥ x_{k + 1} - y_{k} ∥^{2} + ∥ y_{k} - w_{k} ∥^{2}] + 3 M_{0} \sum_{k = 0}^{m} E [η_{k}] . \end{matrix}

Now, using the bound

min_{k = 0, \dots, m} E [∥ y_{k} - w_{k} ∥^{2}] \leq \frac{1}{m + 1} \sum_{k = 0}^{m} E [∥ y_{k} - w_{k} ∥^{2}],

and recalling that Q is finite (see Lemma 10), it follows from (37) together with

min_{k = 0, \dots, m} E [∥ y_{k} - w_{k} ∥^{2}] \leq \frac{1}{m + 1} \sum_{k = 0}^{m} E [∥ y_{k} - w_{k} ∥^{2}],

that the asserted estimate holds. □

The next lemma will be instrumental in establishing the convergence rate of Algorithm 1.

Lemma 11.

Let

S_{k} = \sum_{i = 1}^{k} \frac{c_{k - i + 1}}{b^{i}},

where

b > 1

is a fixed constant,

k \in N

, and

{c_{k}}

is a positive sequence satisfying

\sum_{k = 1}^{\infty} c_{k} < \infty .

Then

{lim}_{k \to \infty} S_{k} = 0 .

Proof.

To establish this result, it suffices to show that

R_{m} = \sum_{k = 1}^{m} S_{k}

for each

m \in N

is bounded. Indeed,

\begin{matrix} R_{m} & = & \sum_{k = 1}^{m} S_{k} \\ = & c_{1} (\frac{1}{b} + \dots + \frac{1}{b^{m}}) + c_{2} (\frac{1}{b} + \dots + \frac{1}{b^{m - 1}}) + \dots + c_{m - 1} (\frac{1}{b} + \frac{1}{b^{2}}) + \frac{c_{m}}{b} \\ < & c_{1} \sum_{i = 1}^{\infty} \frac{1}{b^{i}} + c_{2} \sum_{i = 1}^{\infty} \frac{1}{b^{i}} + \dots + c_{m - 1} \sum_{i = 1}^{\infty} \frac{1}{b^{i}} + c_{m} \sum_{i = 1}^{\infty} \frac{1}{b^{i}} \\ = & \frac{1}{b - 1} \sum_{k = 1}^{m} c_{k} \\ < & \frac{1}{b - 1} \sum_{k = 1}^{\infty} c_{k} \forall m \in N . \end{matrix}

(38)

It then follows that

\sum_{k = 1}^{\infty} S_{k} < \infty .

Consequently, we obtain

{lim}_{k \to \infty} S_{k} = 0 .

□

We now present the theorem that characterizes the convergence rate of Algorithm 1.

Theorem 3.

Assume that Assumption 1 is satisfied, the cost function is strongly pseudomonotone on C, and let

x^{*} \in Γ .

Then there exists a positive integer K such that

E [∥ x_{k + 1} - x^{*} ∥^{2}] \leq \frac{1}{{(1 + φ)}^{k - K + 1}} t_{K} + S_{k}, \forall k \geq K,

where

t_{K} > 0

and

φ > 0

are appropriately chosen constants, and

S_{k}

is defined in Lemma 11.

Proof.

Using the definition of strongly pseudomonotone and our estimate in (23), we obtain

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} & \leq & ∥ w_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ y_{k} - x_{k + 1} ∥^{2} + {∥ w_{k} - y_{k} ∥}^{2}) - 2 λ α_{k} {∥ y_{k} - x^{*} ∥}^{2} \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥^{2} \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ_{k}) (∥ x_{k + 1} - y_{k} ∥^{2} + {∥ y_{k} - w_{k} ∥}^{2}) - 2 λ α_{k} {∥ y_{k} - x^{*} ∥}^{2} \\ + 3 M_{2} η_{k} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥, \forall k \in N_{0} . \end{matrix}

(39)

Noting that

2 p q \leq \frac{1}{\sqrt{2}} p^{2} + \sqrt{2} q^{2}, τ_{k} \leq τ, α_{k} \geq α = {lim}_{k \to \infty} α_{k}

and from (39), we obtain for all

k \geq K

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} \leq ∥ x_{k} - x^{*} ∥^{2} + 3 M_{2} η_{k} - (1 - τ) {∥ x_{k + 1} - y_{k} ∥}^{2} \\ - (1 - τ) ∥ y_{k} - y_{k - 1} + y_{k - 1} - w_{k} ∥^{2} - 2 λ α ∥ y_{k} - x^{*} ∥^{2} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ) {∥ x_{k + 1} - y_{k} ∥}^{2} \\ - (1 - τ) (∥ y_{k} - y_{k - 1} ∥^{2} + {∥ y_{k - 1} - w_{k} ∥}^{2} + 2 ⟨ y_{k} - y_{k - 1}, y_{k - 1} - w_{k} ⟩) \\ - λ α ∥ y_{k} - x^{*} ∥^{2} + 3 M_{2} η_{k} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ) {∥ x_{k + 1} - y_{k} ∥}^{2} - (1 - τ) (∥ y_{k} - y_{k - 1} ∥^{2} + {∥ y_{k - 1} - w_{k} ∥}^{2}) \\ + 3 M_{2} η_{k} + 2 (1 - τ) ∥ y_{k} - y_{k - 1} ∥ ∥ y_{k - 1} - w_{k} ∥ \\ - 2 λ α ∥ y_{k} - x^{*} ∥ 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ) {∥ x_{k + 1} - y_{k} ∥}^{2} - (1 - τ) (∥ y_{k} - y_{k - 1} ∥^{2} + {∥ y_{k - 1} - w_{k} ∥}^{2}) \\ - 2 λ α ∥ y_{k} - x^{*} ∥^{2} + 3 M_{2} η_{k} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ + \frac{1 - τ}{\sqrt{2}} ∥ y_{k} - y_{k - 1} ∥^{2} + \sqrt{2} (1 - τ) {∥ y_{k - 1} - w_{k} ∥}^{2} \\ = & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ) ∥ x_{k + 1} - y_{k} ∥^{2} - (1 - τ) (1 - \frac{1}{\sqrt{2}}) {∥ y_{k} - y_{k - 1} ∥}^{2} \\ + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2} + 3 M_{2} η_{k} - λ α ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ . \end{matrix}

(40)

From (40) we obtain

\begin{matrix} ∥ x_{k + 1} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) {∥ x_{k + 1} - y_{k} ∥}^{2} \\ \leq & ∥ x_{k} - x^{*} ∥^{2} - (1 - τ) (2 - \sqrt{2}) ∥ x_{k + 1} - y_{k} ∥^{2} - λ α {∥ y_{k} - x^{*} ∥}^{2} + 3 M_{2} η_{k} \\ - (1 - τ) (1 - \frac{1}{\sqrt{2}}) ∥ y_{k} - y_{k - 1} ∥^{2} + (1 - τ) (\sqrt{2} - 1) {∥ y_{k - 1} - w_{k} ∥}^{2} \\ + 2 λ_{0} ∥ ε_{2}^{k} ∥ x^{*} - y_{k} ∥, \forall k \geq K . \end{matrix}

(41)

Let

j = 2 λ α + (1 - τ) (2 - \sqrt{2}) - b,

where

b = 2 λ α (\sqrt{2} - 1) (1 - τ) .

Noting that

τ \in (0, 1),

we then conclude that

b \in (0, 2 λ α) .

Thus,

j > 0 .

Setting

f (t) = 2 λ α t^{2} - j t - b, \forall t \in R,

it follows that

f (t) = 0

has both positive and negative root. Take

γ = \frac{j + \sqrt{j^{2} + 16 λ^{2} α^{2} (\sqrt{2} - 1 (1 - τ))}}{4 λ α} .

Noting

f (γ) = 0

and

f (1) \leq 0,

we conclude that

γ > 0 .

For all

k \in N_{0}

, we observe that

\begin{matrix} - ∥ y_{k} - x^{*} ∥^{2} & = & - ∥ y_{k} - x_{k + 1} + x_{k + 1} - x^{*} ∥^{2} \\ = & - ∥ y_{k} - x_{k + 1} ∥^{2} - {∥ x_{k + 1} - x^{*} ∥}^{2} - 2 ⟨ y_{k} - x_{k + 1}, x_{k + 1} - x^{*} ⟩ \\ \leq & - ∥ x_{k + 1} - y_{k} ∥^{2} - ∥ x_{k + 1} - x^{*} ∥^{2} + γ ∥ x_{k + 1} - y_{k} ∥^{2} + \frac{1}{γ} {∥ x_{k + 1} - x^{*} ∥}^{2} \\ = & (γ - 1) ∥ x_{k + 1} - y_{k} ∥^{2} + (\frac{1}{γ} - 1) {∥ x_{k + 1} - x^{*} ∥}^{2} . \end{matrix}

(42)

Substituting (42) into (41),

\forall k \geq K

, we have

\begin{matrix} (1 - 2 λ α (\frac{1}{γ} - 1)) [∥ x_{k + 1} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) {∥ y_{k} - x_{k + 1} ∥}^{2}] \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2} - (1 - τ) (1 - \frac{1}{\sqrt{2}}) {∥ y_{k} - y_{k - 1} ∥}^{2} \\ + (2 λ α (γ - 1) - 2 λ α (\frac{1}{γ} - 1) (1 - τ) (\sqrt{2} - 1) - (1 - τ) (2 - \sqrt{2})) {∥ x_{k + 1} - y_{k} ∥}^{2} \\ + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ + 3 M_{2} η_{k} \\ = & ∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k + 1} - w_{k} ∥^{2} - \frac{(1 - τ) (\sqrt{2} - 1)}{2} {∥ y_{k} - y_{k + 1} ∥}^{2} \\ + \frac{f (γ)}{γ} ∥ x_{k + 1} - y_{k} ∥^{2} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ + 3 M_{2} η_{k} \\ = & ∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2} - \frac{(1 - τ) (\sqrt{2} - 1)}{\sqrt{2}} {∥ y_{k} - y_{k - 1} ∥}^{2} \\ + 3 M_{2} η_{k} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ \\ \leq & ∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2} + 3 M_{2} η_{k} + 2 α_{0} ∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ . \end{matrix}

(43)

Setting

δ = 2 λ α (1 - \frac{1}{γ})

. Clearly,

δ > 0

, and by taking

E [.]

in (43) and applying Lemma 9 with

p = 2,

we obtain

\begin{matrix} (1 + δ) E [∥ x_{k + 1} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ x_{k + 1} - y_{k} ∥^{2}] \\ \leq & E [∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2}] + 2 α_{0} E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥] + 3 M_{2} E [η_{k}] \\ = & E [∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2}] + 2 α_{0} E [E [∥ ε_{2}^{k} ∥ ∥ y_{k} - x^{*} ∥ | F_{k}]] + 3 M_{2} E [η_{k}] \\ = & E [∥ x_{k} - x^{*} ∥^{2}] + (1 - τ) (\sqrt{2} - 1) {∥ y_{k - 1} - w_{k} ∥}^{2} + 2 α_{0} \frac{m_{3} + m_{5} η_{k} + m_{4} {∥ x_{k} - x^{*} ∥}^{2}}{\sqrt{N_{k}}} \\ + 3 M_{2} E [η_{k}] \\ \leq & E [∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2}] + \frac{2 α_{0} (m_{3} + m_{5} + m_{4} P_{0})}{\sqrt{N_{k}}} + 3 M_{2} E [η_{k}] \\ = & E [∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2}] + p^{'} + \frac{M_{8}}{\sqrt{N_{k}}}, \forall k \geq K, \end{matrix}

(44)

where

M_{8} = 2 α_{0} (m_{3} + m_{5} + m_{4} P_{0}), P_{0} = {sup}_{k \in N} {∥ x_{k} - x^{*} ∥}^{2} .

Define

\begin{matrix} β_{k} = E [∥ x_{k} - x^{*} ∥^{2} + (1 - τ) (\sqrt{2} - 1) ∥ y_{k - 1} - w_{k} ∥^{2}] . \end{matrix}

From the definition of

β_{k}

and (44), one obtains

\begin{matrix} E [∥ x_{k + 1} - x^{*} ∥^{2}] & \leq & \frac{β_{k}}{1 + δ} + \frac{1}{1 + δ} (p^{'} + \frac{M_{8}}{\sqrt{N_{k}}}) \\ \leq \\ ⋮ \\ \leq & \frac{β_{K}}{{(1 + δ)}^{k - K + 1}} + \sum_{i = K}^{k} [\frac{p^{'}}{{(1 + δ)}^{i - K + 1}} + \frac{M_{8}}{{(1 + δ)}^{i - K + 1} \sqrt{N_{k - 1 + K}}}] \\ \leq & \frac{β_{K}}{{(1 + δ)}^{k - K + 1}} + \sum_{i = 1}^{k} [\frac{p^{'}}{{(1 + δ)}^{i}} + \frac{M_{8}}{{(1 + δ)}^{i} \sqrt{N_{k - i + 1}}}] \\ = & \frac{β_{K}}{{(1 + δ)}^{k - K + 1}} + S_{k}, \forall k \geq K, \end{matrix}

(45)

where

S_{k} = \sum_{i = 1}^{k} [\frac{p^{'}}{{(1 + δ)}^{i}} + \frac{M_{8}}{{(1 + δ)}^{i} \sqrt{N_{k - i + 1}}}] .

Taking

b = 1 + δ

and

c_{k - i + 1}

with

\frac{1}{\sqrt{N_{k - i + 1}}}

as in Lemma 8, we obtain

S_{k} \to 0

as

k \to \infty .

This completes the proof of Theorem 3. □

5. Applications and Numerical Illustrations

System Set Up: All experiments were executed on a 64-bit Windows machine powered by an Intel(R) Core(TM) i7-6600U CPU @ 2.60 GHz (2 cores, 4 threads) with 8 GB RAM. Python 3.9 environment was used for numerical computation, data analysis, and visualization with the help of some essential python libraries like: NumPy, SciPy Pandas, and Matplotlib.

We consider four experiments in this research and compare the performance of our proposed algorithm with the existing ones (see Table 1). In particular, Algorithm 1 of Nwawuru et al. will be compared with Wang et al. 2022 (see, (Algorithm 3.1, [30])), Liu and Qin, 2024 (see (Algorithm 1, [31])), Li et al. 2023 (see (Algorithm 3, [43])) and Long and He, 2023 (see (Algorithm 1, [38])). The comparison relates to the CPU time by averaging across 20 sample paths.

Table 1. Comparison of the algorithms considered against the proposed scheme and their main features.

We use numpy.random.rand() and numpy.random.randn() in Python to generate

i . i . d .

samples from

Ξ

and terminate the algorithms when the total number of iterations reaches 250. Furthermore, we set

N_{k} = N [(k + λ) {(I n (k + λ))}^{105}]

with

λ = 5

and

N = 250

for all the selected algorithms. Moreso, we take

γ = 0.05, θ = 0.5, μ = 0.5

.

We consider the following examples.

Example 1.

For

x \geq 0, E [G (x, ξ)] \geq 0, ⟨ x, E [G (x, ξ)] ⟩ = 0,

where

G (x, ξ) : = D (x, ξ) + M (ξ) x + q (ξ)

with

D_{i} (x, ξ) : = d_{i} (ξ) * a r c t a n {(a_{i} (ξ) x_{i})}_{i = 1}^{n}, d (ξ)

and

a (ξ)

are randomly generated from uniform distribution on

[0, 1], M (ξ) : = B + Y (ξ)

with

B \in R^{n \times n}

to be a deterministic skew symmetric matrix generated from uniform distribution on

[0, 3], Y (ξ) \in R^{n \times n}

to be a diagonal matrix generated from uniform distribution on

[0, 2]

and

q (ξ) \in R^{n}

is randomly generated from uniform distribution on

[- 2, 2] .

Figure 1 below demonstrate of the performance of Algorithm 1, Wang et al. [30], Liu and Qin [31], Li et al. [43], and Long and He [38].

Figure 1. Comaprsion for

N = 50, 100, 150, 200, 250

(Algorithm 1,30,31,38,43]).

Table 2 below shows all algorithms converging across sample sizes. Algorithm 1 achieve balanced performance, requiring fewer iterations than Wang et al. [30] and Liu and Qin [31], while maintaining a competitive speed similar to Li et al. [43]. Its strength lies in combining stability with efficiency, offering reliable convergence under varying sample sizes. However, it should be noted that Long and He [38] achieve the fastest. Their algorithm is non-monotone, Lipschitz continuous with one oracle call per iteration, while Algorithm 1 have two oracle calls per iteration (see [38], Remark 1(i), Table 1).

Table 2. Convergence summary for the five algorithms across different sample sizes N.

Example 2. (Network Bandwidth allocation) We consider a communication network in which individual users, acting selfishly, compete for shared bandwidth resources. The set of all users in a network is indexed by

S : = {1, 2, \dots, m} .

Noting that each user can access multiple routes, one assumes that

Φ (s)

is the set of routes governed by users

s \in S .

For

s \in S,

let

ϑ (s)

denotes the number of elements in

Φ (s)

and

Θ : = \sum_{s \in S} ϑ (s) .

Let

x_{s}^{r} (r \in Φ) (s)

be the flow rate for users s through which route r goes. The set of all links is denoted by

L : = {1, 2, \dots, κ .} .

The set of routes is denoted by

R : = {1, 2, \dots, t} .

For

d \in L,

one assumes that

L_{s}^{(t)}

is the set of links through which route

t \in Φ (s)

goes. Suppose that

\begin{matrix} X_{s, d}^{t} = \{\begin{matrix} 1, i f d \in L_{s}^{(t)}; \\ 0, o t h e r w i s e . \end{matrix} \end{matrix}

When the flow rate is allocated to a user participating in a network, it derives a utility modeled as the value of a concave function. The utility function of each user is parametrized by the uncertainty, which is defined by

f_{s} (x_{s}, ξ_{s}) : = \sum_{r \in ϑ (s)} ξ_{s}^{(r)} I n (1 + x_{s}^{r}), \forall s \in S,

where

x_{s} : = {(x_{s}^{(r)})}_{r \in Φ (s)}

is the flow rate decision vector of users,

ξ_{s}^{r}

is the random weighted parameter route

r \in Φ (s),

and

ξ_{s} : = {(ξ_{s}^{r})}_{r \in Φ (s)} .

The flow rate allocated to each user is regulated with a control mechanism to prevent network congestion in the bandwidth allocation. Such a mechanism ensures that the sum of the transmission rate for users sharing the link

ν \in L

is less than or equal to the limited capacity of the link

t,

that is,

X_{ν} : = \{{({(x_{s}^{(r)})}_{r \in ϑ (s)})}_{s \in S} \in R_{+}^{Θ} : \sum_{s \in S} x_{s}^{(r)} X_{s, d}^{t} \leq c_{ν},\},

where

c_{t} > 0

is the capacity of link

t \in L .

Set

c : = {(c_{t})}_{t \in L} .

We now formulate this model as a stochastic optimization problem given by

\begin{matrix} m a x i m i z e \sum_{E} [f_{s} (x_{s}, ξ_{s})] \\ s u b j e c t t o x : = {({(x_{s}^{(r)})}_{r \in Φ (s)})}_{s \in S} \in \cap_{t \in L} X_{t}, \end{matrix}

(46)

Let

M

denote the adjacency matrix that describes the correlation between the set of links

L

and the set of routes

R : = \cup_{s \in S} (ϑ (s)) .

We assumes that

M_{t, r} : = 1

if route

r \in R

goes through link

ν \in L,

otherwise

0 .

We observe that problem (46) can be captured by an SVI in a compact form:

Find

x^{*} \in X : = \cap_{ν \in L} X_{t},

such that

⟨ F (x^{*}), x - x^{*} ⟩ \geq 0,

for any

x \in \cap_{t \in L},

where

\begin{matrix} F (x) : & = & \nabla (- \sum_{s \in S} E [f_{s} (x_{s}, ξ_{s})]) \\ = & - {(\frac{E [ξ_{s}^{r}]}{1 + x_{s}^{(r)}})}_{s \in S}, \end{matrix}

(47)

and

\begin{matrix} X : = {{({(x_{s}^{(r)})}_{r \in ϑ (s)})}_{s \in S} \in R_{+}^{Θ} : M (x) \leq c} . \end{matrix}

(48)

Now, we consider the bandwidth allocation problem on a network that consists of 10 nodes, 10 links, and 2 users. It is observed that

S : = {1, 2}, L : = {1, 2, \dots, 10}, R : {1, 2, \dots, 5}, Φ (1) : = {1, 2}

and

Φ (2) = {1, 2, 3} .

In this experiment, the weighted parameters

ξ_{s}^{r} (s \in S, r \in ϑ (s))

are i.i.d. and drawn randomly from the uniform distribution according to Table 1. Beside, the links have limited capacity

c : = (c_{ν \in L})

with the values

c_{1}, \dots, c_{5}

given Table 2. One can check that the involved operator

F (.)

in (47) is quasi-monotone and Lipschitz continuous in

X

with the constant

L (ξ) : = {max}_{s \in S, r \in ϑ (s)} ξ_{s}^{r} + 2 ∥ {MM}^{T} ∥ .

Figure 2 below shows a network of 10 nodes, 10 links, and 5 routes connecting two users. Core links carry higher capacities to support aggregate flows, while edge links act as potential bottlenecks. Shared routes between users highlight congestion risks, emphasizing the importance of efficient bandwidth allocation and resource management.

Figure 2. The bandwidth allocation network structure.

Table 3 presents uniform distributions of random parameters for each user’s routes. These ranges capture uncertainty in flow performance, modeling variability in network conditions, and ensuring fairness during bandwidth allocation.

Table 3. Uniform distribution of randomly generated parameters

ξ_{s}^{r}

for each user-route pair.

Table 4 below reveals how different links are utilized in optimized flow allocation. Heavily loaded links like 4 and 8 should be carefully monitored or upgraded in real networks, while lightly loaded links like 6 and 7 might indicate insufficient routing or a redundant path. NB:Mbps means megabits per second and Gbps means gigabits per second.

Table 4. Link capacities

c_{ν}

(sample values). Units: flow units (replace with Mbps/Gbps as needed).

Example 3. (Networked stochastic Nash–Cournot game) In this experiment, we consider a networked Nash–Cournot game adopted in [44] under uncertainty data, in which the cost-minimizing agents compete in quantity levels when facing a price function associated with aggregate output. Suppose that there are Ω firms that compete over a network of Λ nodes in supplying a homogeneous product in a non-cooperative sense. Let

x_{i, j}

denote the level of sales of firm

i \in [Ω]

and at a node

j \in [Λ]

. Assume that the firm is characterized by a random linear cost function

c_{i} (X_{i}, ξ_{i}) = (a_{i}, ξ_{i}) \sum_{j \in [Λ]} X_{i, j}

for some parameters

a_{i} > 0 \forall i,

where

ξ_{i}

is a mean zero random variable. We assume that the price at node j represented by

P_{j} (\sum_{i \in [Ω]} x_{i, j}, η_{j})

is a stochastic linear function corrupted by noisy

P_{j} (\sum_{i \in [Ω]} X_{i, j}, η_{j}) = d_{j} + η_{j} - b_{j} \sum_{i \in [Ω]} q_{i, j},

where

d_{j}

indicates the price when then production is zero,

b_{j}

is the slope at the inverse demand function while

η_{j}

is a zero-mean random disturbance. Assume that the transport cost is zero. Except for non-negativity constraints in

x_{i, j},

we suppose that the firms

i^{'} s

production at node j is capacitated by

c a p_{i, j} .

We now transform the firm

i^{'} s

into stochastic optimization problem given below:

\begin{matrix} min E [f_{i} (x, ξ, η)] = E [c_{i} (X_{i}, ξ_{i}) - \sum_{j \in [Λ]} P_{j} (\sum_{i \in [Ω]} X_{i, j}, η_{j}) X_{i, j}] \\ s . t . x_{i} \in X_{i} = {x_{i} \in R^{Λ} : x_{i} > 0, x_{i, j} \leq c a p_{i, j}} \end{matrix}

(49)

Under some dominated conditions, when we interchange the others of expectation and derivative, the above stochastic Nash–Cournot game may be transformed into problem (3) with

X = \prod_{i = 1}^{Ω} X_{i}

and

F (x) = (F_{1} (x); \dots; F_{Ω} (x))

with

F_{i} (x) = E [\partial_{x_{i}} f_{i} (x, ξ, η)] .

Just like in [44], we consider a network with

Ω = 20

firms,

Λ = 10

markets and the capacity

C a p_{i, j} = 2

for each

\in \in Ω

and each

j \in Λ .

In the experiment, the parameters in the payoffs were set as

d \approx U (40, 50), b_{j} \approx U (1, 2), a_{i} \approx (3, 5)

for all

i \in [Ω]

and

j \in [Λ],

where

U (u^{1}, u^{2})

denotes the uniform distribution over an interval

(u^{1}, u^{2}),

where

u^{1} < u^{2} .

For the random data in the model, we assume

ξ_{i} \approx U (\frac{- a_{i}}{5}, \frac{a_{i}}{5})

and

η_{j} \approx U (- \frac{b_{j}}{5}, \frac{b_{j}}{5}), x_{0} = x_{- 1} = {(1, \dots, 1)}^{T}

for all the algorithm.

Figure 3 shows performance comparison of algorithms across convergence behavior, computational efficiency, and underlying network structure.

Figure 3. Networked stochastic Nash–Cournot game. (a) Empirical gap-function error, the error decreases across all algorithms, with Algorithm 1 converging fastest, Li et al. [43] closely following, while Liu and Qin [31], Long and He [38], and Wang et al. [30] converge slower. (b) Logarithm of empirical error, the plot shows faster convergence for Algorithm 1, while Wang et al. [30] lags behind, confirming differences in algorithmic efficiency. (c) CPU time, Algorithm 1 requires the least computation, reflecting efficiency, while Wang et al. [30] consumes the most, indicating higher computational overhead. (d) Network structure, twenty firms are interconnected with ten markets, highlighting competitive supply interactions and capacity-constrained distribution across nodes.

Table 5 below highlights uniform link capacities of two across all firm–market connections, while market prices vary stochastically through demand parameters. This structure reflects balanced competition, ensuring equal production opportunities for all firms, while random price variations across markets capture uncertainty and heterogeneity in the Nash–Cournot game environment.

Table 5. Capacity with respect to price, market, and firm.

6. Conclusions

This work introduced an adaptive inertial stochastic projection framework for solving stochastic variational inequalities whose cost function is quasi-monotone, and the developed scheme was applied to stochastic complementary problems, networked stochastic Nash–Cournot game, and bandwidth allocation problem. By combining inertia with adaptive stepsize selection, the method accelerates convergence while preserving robustness under uncertainty. Stochastic projections ensure feasibility with link capacity constraints, preventing congestion and promoting fairness. Numerical experiments demonstrate superior efficiency, scalability, and stability compared to conventional techniques. Beyond bandwidth allocation, the approach offers a versatile tool for stochastic optimization problems in uncertain environments. In a nutshell, the algorithm delivered a resilient and computationally efficient solution, advancing both theory and practice in modern network resource management.

Author Contributions

F.O.N., J.N.E., and M.D., conceptualized the research idea, with all three contributing significantly to the manuscript’s writing and revision. F.O.N., and I.A.-D. carried out the computations and established the appropriate convergence analysis, proving almost surely the rate of convergence. F.O.N. and J.N.E. performed the numerical experiments, created figures and tables, and analyzed performance data. All authors discussed the results and provided critical feedback, which improved the quality of the manuscript. F.O.N., M.D., and I.A.-D. coordinated the research activities and ensured integration of all contributions. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2502).

Data Availability Statement

There were no data analyzed in this project.

Acknowledgments

The authors are grateful to the three anonymous reviewers for their constructive comments and valuable suggestions, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no competing interest.

References

Royset, J.O. Risk-adaptive approaches to stochastic optimization: A survey. SIAM Rev. 2025, 67, 3–70. [Google Scholar] [CrossRef]
Liang, H.; Zhuang, W. Stochastic modeling and optimization in a microgrid: A survey. Energies 2014, 7, 2027–2050. [Google Scholar] [CrossRef]
Sclocchi, A.; Wyart, M. On the different regimes of stochastic gradient descent. Proc. Natl. Acad. Sci. USA 2024, 121, e2316301121. [Google Scholar] [CrossRef]
Dong, D.; Liu, J.; Tang, G. Sample average approximation for stochastic vector variational inequalities. Appl. Anal. 2023, 103, 1649–1668. [Google Scholar] [CrossRef]
Robbins, H.; Monro, S. A stochastic approximation method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
Farh, H.M.H.; Al-Shamma’a, A.A.; Alaql, F.; Omotoso, H.O.; Alfraidi, W.; Mohamed, M.A. Optimization and uncertainty analysis of hybrid energy systems using Monte Carlo simulation integrated with genetic algorithm. Comput. Electr. Eng. 2024, 120, 109833. [Google Scholar] [CrossRef]
Stampacchia, G. Formes bilinéaires coercitives sur les ensembles convexes. Comptes Rendus l’Académie Sci. Série A 1964, 258, 4413–4416. [Google Scholar]
Nagurney, A. Network Economics: A Variational Inequality Approach; Kluwer Academic: Dordrecht, The Netherlands, 1999. [Google Scholar]
Shehu, Y.; Iyiola, O.S.; Reich, S. A modified inertial subgradient extragradient method for solving variational inequalities. Optim. Eng. 2022, 23, 421–449. [Google Scholar] [CrossRef]
Liu, L.; Cho, S.Y.; Yao, J.C. Convergence analysis of an inertial Tseng’s extragradient algorithm for solving pseudomonotone variational inequalities and applications. J. Nonlinear Var. Anal. 2021, 2, 47–63. [Google Scholar]
Nwawuru, F.O.; Echezona, G.N.; Okeke, C.C. Finding a common solution of variational inequality and fixed point problems using subgradient extragradient techniques. Rend. Circ. Mat. Palermo II Ser. 2024, 73, 1255–1275. [Google Scholar] [CrossRef]
Dilshad, M.; Alamrani, F.M.; Alamer, A.; Alshaban, E.; Alshehri, M.G. Viscosity-type inertial iterative methods for variational inclusion and fixed point problems. AIMS Math. 2024, 9, 18553–18573. [Google Scholar] [CrossRef]
Facchinei, F.; Pang, J.S. Finite-Dimensional Variational Inequalities and Complementarity Problems; Springer: New York, NY, USA, 2003. [Google Scholar]
Jiang, H.; Xu, H. Stochastic approximation approaches to stochastic variational inequality problems. IEEE Trans. Autom. Control 2008, 53, 1462–1475. [Google Scholar] [CrossRef]
Shapiro, A. Monte Carlo sampling methods. In Handbooks in Operations Research and Management Science: Stochastic Programming; Ruszczyński, A., Shapiro, A., Eds.; Elsevier: Amsterdam, The Netherlands, 2003; pp. 353–425. [Google Scholar]
Wang, M.Z.; Lin, G.H.; Gao, Y.L.; Ali, M.M. Sample average approximation method for a class of stochastic variational inequality problems. J. Syst. Sci. Complex. 2011, 24, 1143–1153. [Google Scholar] [CrossRef]
He, S.X.; Zhang, P.; Hu, X.; Hu, R. A sample average approximation method based on a D-gap function for stochastic variational inequality problems. J. Ind. Manag. Optim. 2014, 10, 977–987. [Google Scholar] [CrossRef]
Cherukuri, A. Sample average approximation of conditional value-at-risk based variational inequalities. Optim. Lett. 2024, 18, 471–496. [Google Scholar] [CrossRef]
Zhou, Z.; Honnappa, H.; Pasupathy, R. Drift optimization of regulated stochastic models using sample average approximation. arXiv 2025, arXiv:2506.06723. [Google Scholar] [CrossRef]
Yousefian, F.; Nedić, A.; Shanbhag, U.V. Distributed adaptive steplength stochastic approximation schemes for Cartesian stochastic variational inequality problems. arXiv 2013, arXiv:1301.1711. [Google Scholar] [CrossRef]
Koshal, J.; Nedić, A.; Shanbhag, U.V. Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 2013, 58, 594–609. [Google Scholar] [CrossRef]
Iusem, A.N.; Jofré, A.; Thompson, P. Incremental constraint projection methods for monotone stochastic variational inequalities. Math. Oper. Res. 2019, 44, 236–263. [Google Scholar] [CrossRef]
Yang, Z.P.; Zhang, J.; Wang, Y.; Lin, G.H. Variance-based subgradient extragradient method for stochastic variational inequality problems. J. Sci. Comput. 2021, 89, 4. [Google Scholar] [CrossRef]
Korpelevich, G.M. The extragradient method for finding saddle points and other problems. Matekon 1976, 12, 747–756. [Google Scholar]
Iusem, A.N.; Jofré, A.; Oliveira, R.I.; Thompson, P. Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 2017, 27, 686–724. [Google Scholar] [CrossRef]
Nwawuru, F.O. Approximation of solutions of split monotone variational inclusion problems and fixed point problems. Pan-Am. J. Math. 2023, 2, 1. [Google Scholar] [CrossRef]
Iusem, A.N.; Jofré, A.; Oliveira, R.I.; Thompson, P. Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 2019, 29, 175–206. [Google Scholar] [CrossRef]
Censor, Y.; Gibali, A.; Reich, S. The subgradient extragradient method for solving variational inequalities in Hilbert space. J. Optim. Theory Appl. 2011, 148, 318–335. [Google Scholar] [CrossRef]
Nwawuru, F.O.; Ezeora, J.N.; ur Rehman, H.; Yao, J.-C. Self-adaptive subgradient extragradient algorithm for solving equilibrium and fixed point problems in Hilbert spaces. Numer. Algorithms 2025. [Google Scholar] [CrossRef]
Wang, S.; Tao, H.; Lin, R.; Cho, Y.J. A self-adaptive stochastic subgradient extragradient algorithm for the stochastic pseudomonotone variational inequality problem with application. Z. Angew. Math. Phys. 2022, 73, 164. [Google Scholar] [CrossRef]
Liu, L.; Qin, X. An accelerated stochastic extragradient-like algorithm with new stepsize rules for stochastic variational inequalities. Comput. Math. Appl. 2024, 163, 117–135. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iterative methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Nwawuru, F.O.; Ezeora, J.N. Inertial-based extragradient algorithm for approximating a common solution of split-equilibrium problems and fixed-point problems of nonexpansive semigroups. J. Inequalities Appl. 2023, 2023, 22. [Google Scholar] [CrossRef]
Ezeora, J.N.; Enyi, C.D.; Nwawuru, F.O.; Ogbonna, R.C. An algorithm for split equilibrium and fixed-point problems using inertial extragradient techniques. Comput. Appl. Math. 2023, 42, 103. [Google Scholar] [CrossRef]
Nwawuru, F.O.; Narian, O.; Dilshad, M.; Ezeora, J.N. Splitting method involving two-step inertial iterations for solving inclusion and fixed point problems with applications. Fixed Point Theory Algorithms Sci. Eng. 2025, 2025, 8. [Google Scholar] [CrossRef]
Enyi, C.D.; Ezeora, J.N.; Ugwunnadi, G.C.; Nwawuru, F.O.; Mukiawa, S.E. Generalized split feasibility problem: Solution by iteration. Carpathian J. Math. 2024, 40, 655–679. [Google Scholar] [CrossRef]
Nesterov, Y.E. A method for solving a convex programming problem with convergence rate O(1/k²). Dokl. Akad. Nauk SSSR 1983, 269, 543–547. [Google Scholar]
Long, X.J.; He, Y.H. A fast stochastic approximation-based subgradient extragradient algorithm with variance reduction for solving stochastic variational inequality problems. J. Comput. Appl. Math. 2023, 420, 114786. [Google Scholar] [CrossRef]
Zhang, X.; Du, X.; Yang, Z.; Lin, G. An infeasible stochastic approximation and projection algorithm for stochastic variational inequalities. J. Optim. Theory Appl. 2019, 183, 1053–1076. [Google Scholar] [CrossRef]
Fang, C.; Chen, S. Some extragradient algorithms for variational inequalities. In Advances in Variational and Hemivariational Inequalities; Han, W., Migórski, S., Sofonea, M., Eds.; Advances in Mechanics and Mathematics; Springer: Cham, Switzerland, 2015; Volume 33, pp. 145–171. [Google Scholar]
Ezeora, J.N.; Nwawuru, F.O. An inertial-based hybrid and shrinking projection methods for solving split common fixed point problems in real reflexive spaces. Int. J. Nonlinear Anal. Appl. 2023, 14, 2541–2556. [Google Scholar]
Robbins, H.; Siegmund, D. A convergence theorem for nonnegative almost supermartingales and some applications. In Optimizing Methods in Statistics; Rustagi, J.S., Ed.; Academic Press: New York, NY, USA, 1971; pp. 233–257. [Google Scholar]
Li, T.; Cai, X.; Song, Y.; Ma, Y. Improved variance reduction extragradient method with line search for stochastic variational inequalities. J. Glob. Optim. 2023, 87, 423–446. [Google Scholar] [CrossRef]
Yang, Z.P.; Lin, G.H. Variance-based single-call proximal extragradient algorithms for stochastic mixed variational inequalities. J. Optim. Theory Appl. 2021, 190, 393–427. [Google Scholar] [CrossRef]

Figure 1. Comaprsion for

N = 50, 100, 150, 200, 250

(Algorithm 1,30,31,38,43]).

Figure 2. The bandwidth allocation network structure.

Figure 3. Networked stochastic Nash–Cournot game. (a) Empirical gap-function error, the error decreases across all algorithms, with Algorithm 1 converging fastest, Li et al. [43] closely following, while Liu and Qin [31], Long and He [38], and Wang et al. [30] converge slower. (b) Logarithm of empirical error, the plot shows faster convergence for Algorithm 1, while Wang et al. [30] lags behind, confirming differences in algorithmic efficiency. (c) CPU time, Algorithm 1 requires the least computation, reflecting efficiency, while Wang et al. [30] consumes the most, indicating higher computational overhead. (d) Network structure, twenty firms are interconnected with ten markets, highlighting competitive supply interactions and capacity-constrained distribution across nodes.

Table 1. Comparison of the algorithms considered against the proposed scheme and their main features.

Method	Per-Iteration Oracle & Projections	Stepsize Control	Extras	Convergence Behavior (Qualitative)
Li et al. [43]	Two sample sets ( $ξ, η$ ); two projections to X	Residual-based linesearch $r_{k} (\cdot)$ , adaptive warm-start	None (no inertia)	A.s. convergence with growing batches; often takes larger steps means faster practical progress
Liu & Qin [31]	Two averaged maps; one projection to X + one to half-space $C_{n}$	Discrete backtracking on $γ_{n}$ ; adaptive $α_{n}$	Inertial extrapolation $w_{n}$ with cap	A.s. convergence under monotone/quasi-monotone maps; inertia accelerates but is safeguarded
Long & He [38]	Uses $ξ$ for $x_{k}$ , $η$ for $y_{k}$ ; one projection to X + one to half-space $T_{k}$	Backtracking on $α_{k}$ via two-point inequality	None	A.s. convergence for monotone-type F; very robust in noise, steady progress
Algorithm 1	One projection to X + one to half-space $T_{k}$ ; two batch evaluations	Adaptive $α_{k + 1}$ via norm ratio rule	Inertial $w_{k}$ with adaptive cap	A.s. convergence under quasi-monotone F; good balance of stability and speed
Wang et al. [30]	One projection to X + one to half-space $T_{k}$ ; two batch evaluations	Adaptive $λ_{k + 1}$ via squared-norm ratio	None	A.s. convergence for monotone mappings; conservative but very stable

Table 2. Convergence summary for the five algorithms across different sample sizes N.

Algorithm	$N = 50$			$N = 100$			$N = 150$			$N = 200$			$N = 250$
Algorithm	Time (s)	Iter	Conv	Time (s)	Iter	Conv	Time (s)	Iter	Conv	Time (s)	Iter	Conv	Time (s)	Iter	Conv
Li et al. [43]	0.002	35	Yes	0.003	30	Yes	0.004	28	Yes	0.005	27	Yes	0.006	25	Yes
Liu and Qin [31]	0.002	40	Yes	0.004	36	Yes	0.004	33	Yes	0.005	31	Yes	0.006	29	Yes
Long and He [38]	0.001	32	Yes	0.002	29	Yes	0.003	27	Yes	0.003	25	Yes	0.004	24	Yes
Algorithm 1	0.002	38	Yes	0.003	34	Yes	0.004	31	Yes	0.004	29	Yes	0.005	28	Yes
Wang et al. [30]	0.001	45	Yes	0.002	41	Yes	0.003	39	Yes	0.004	36	Yes	0.004	34	Yes

Table 3. Uniform distribution of randomly generated parameters

ξ_{s}^{r}

for each user-route pair.

Table 3. Uniform distribution of randomly generated parameters

ξ_{s}^{r}

for each user-route pair.

User	Route	Random Parameter $ξ_{s}^{r}$	Distribution Range
User 1	$R_{1}$	$ξ_{1}^{(1)}$	$U (0.8, 1.2)$
	$R_{2}$	$ξ_{1}^{(2)}$	$U (0.7, 1.3)$
User 2	$R_{1}$	$ξ_{2}^{(1)}$	$U (0.9, 1.4)$
	$R_{2}$	$ξ_{2}^{(2)}$	$U (0.6, 1.1)$
	$R_{3}$	$ξ_{2}^{(3)}$	$U (0.75, 1.25)$

Table 4. Link capacities

c_{ν}

(sample values). Units: flow units (replace with Mbps/Gbps as needed).

Table 4. Link capacities

c_{ν}

(sample values). Units: flow units (replace with Mbps/Gbps as needed).

Link $ν$	Incident Routes	Example Load	Suggested Capacity $c_{ν}$	Slack ( $c_{t}$ -Load)
1	{1}	$0.693$	$1.386$	$0.693$
2	{1,2}	$1.118$	$2.235$	$1.117$
3	{1,2}	$1.118$	$2.235$	$1.117$
4	{2,3}	$1.512$	$3.024$	$1.512$
5	{3}	$1.088$	$2.175$	$1.088$
6	{4}	$0.471$	$0.942$	$0.471$
7	{4}	$0.471$	$0.942$	$0.471$
8	{4,5}	$1.207$	$2.414$	$1.207$
9	{5}	$0.736$	$1.471$	$0.736$
10	{5}	$0.736$	$1.471$	$0.736$

Table 5. Capacity with respect to price, market, and firm.

Market	Price Parameters $(d_{j}, b_{j})$	Firms Connected	Capacity per Link
Market 1	$d_{1} \sim U (40, 50), b_{1} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 2	$d_{2} \sim U (40, 50), b_{2} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 3	$d_{3} \sim U (40, 50), b_{3} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 4	$d_{4} \sim U (40, 50), b_{4} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 5	$d_{5} \sim U (40, 50), b_{5} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 6	$d_{6} \sim U (40, 50), b_{6} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 7	$d_{7} \sim U (40, 50), b_{7} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 8	$d_{8} \sim U (40, 50), b_{8} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 9	$d_{9} \sim U (40, 50), b_{9} \sim U (1, 2)$	Firm 1–Firm 20	2
Market 10	$d_{10} \sim U (40, 50), b_{10} \sim U (1, 2)$	Firm 1–Firm 20	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

On Quasi-Monotone Stochastic Variational Inequalities with Applications

Abstract

1. Introduction

2. Preliminaries

3. Proposed Algorithm

4. Convergence Analysis

4.1. Almost Surely Convergence

4.2. Rate of Convergence

5. Applications and Numerical Illustrations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics