You are currently viewing a new version of our website. To view the old version click .
Axioms
  • Article
  • Open Access

11 December 2025

On Quasi-Monotone Stochastic Variational Inequalities with Applications

,
,
and
1
Department of Mathematics, Faculty of Science, University of Tabuk, Tabuk 71491, Saudi Arabia
2
Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), P.O. Box 65892, Riyadh 11566, Saudi Arabia
3
Analysis, Control System and Optimization Research Group (ACoSORG), Department of Mathematics, Faculty of Science, Chukwuemeka Odumegwu Ojukwu University, Uli 431124, Anambra State, Nigeria
4
Department of Mathematics/Statistics, Faculty of Science, University of Port Harcourt, Choba 500241, Rivers State, Nigeria
This article belongs to the Special Issue Mathematical Optimization, Variational Inequalities and Equilibrium Problems: Theory and Applications

Abstract

This paper studies an efficient method for solving stochastic optimization problems formulated as stochastic variational inequalities with a quasi-monotone operator, where the cost function extends the classical monotone and pseudomonotone operators. Our proposed method iterates an adaptive stepsize that adjusts automatically without linesearch and includes a momentum term to accelerate the convergence. Each iteration requires only a single projection onto the feasible set, ensuring low computational complexity. Under standard assumptions, the algorithm achieves almost sure convergence and a proven convergence rate. Furthermore, numerical experiments demonstrate its superior performance, accuracy, stability, and efficiency compared with existing stochastic approximation schemes. We also apply the method to problems such as stochastic network bandwidth allocation, stochastic complementarity problems, and the networked stochastic Nash–Cournot game, showing its strength and practical usefulness. The obtained result is an extension of existing works in the literature.

1. Introduction

We consider a stochastic optimization problem of the form:
min x X F ( x ) = E [ f ( x , ξ ( ϖ ) ) ] ,
where x is the decision variable, X is a feasible subset of R n , ξ : Ω Ξ is a random variable representing uncertainty, f ( x , ξ ) is a random cost function, and E [ f ( x , ξ ( ϖ ) ) ] denotes its expectation. Problem (1) is known as a stochastic optimization (SO) problem.
Stochastic optimization provides a powerful framework for modeling and solving problems where randomness influences objectives or constraints. Such uncertainty is noticeable in real-world applications, including machine learning, where data are inherently noisy; finance, where markets fluctuate; engineering design, where measurements are imprecise; and energy systems, where renewable sources vary unpredictably (see, [1,2]). Unlike deterministic optimization, SO explicitly incorporates statistical variability into the modeling process, yielding solutions that are not only optimal in expectation but also robust under uncertainty. Its effectiveness lies in balancing exploration of the feasible region with exploitation of informative samples, which mitigates the risk of convergence to poor local minima in nonconvex or high-dimensional settings. Consequently, SO underpins many modern algorithms, such as stochastic gradient descent for deep learning, Monte Carlo-based optimization in engineering, and stochastic portfolio selection in finance.
In general, four classical approaches are used to solve problem (1): the stochastic gradient descent (SGD) method (see, e.g., [3]), the sample average approximation (SAA) [4], the stochastic approximation (SA) method [5], and evolutionary or Monte Carlo-based algorithms [6]. These methods have been extensively analyzed and successfully applied in diverse scientific and engineering domains. In this work, we focus on the SA framework, which is particularly effective when the expected value E [ f ( x , ξ ) ] is difficult or impossible to compute exactly. Our study connects SA methods to the theory of variational inequalities, providing a unified framework for optimization and equilibrium modeling under uncertainty.
The variational inequality problem (VIP), first introduced in [7], is a fundamental model that generalizes many problems in optimization, game theory, and economics. For a nonempty, closed, and convex set X R n , the variational inequality problem (VIP) is expressed as follows:
F i n d x X such   that F ( x ) , y x 0 , y X ,
where F is a nonlinear operator. Optimization problems correspond to VIP with F = g , where g is the objective function; Nash equilibrium conditions arise when F represents the vector of marginal payoffs; and traffic or network equilibrium problems appear when F models congestion effects [8]. Over the years, extensive studies (see, e.g., [9,10,11,12]) have focused on efficient numerical schemes and theoretical properties of deterministic VIP. However, in many practical situations, the operator F cannot be exactly computed due to noise or randomness. Examples include learning systems, where gradients are estimated from samples, financial models with stochastic returns, and network systems with uncertain demand. To address such cases, the notion of the stochastic variational inequality problem (SVIP) was developed in [5].
For clarity, assume that X is a nonempty, closed, and convex subset of R n , and let ( Ω , F , P ) denote a probability space. Consider a function f : R n × Ξ R n that is measurable with respect to the random variable ξ : Ω Ξ . The SVIP is formulated as
find x X such   that F ( x ) , y x 0 , y X , where   the   estimate F ( x ) : = E [ f ( x , ξ ( ϖ ) ) ] = Ω f ( x , ξ ( ϖ ) ) d P ( ϖ ) ,
and E denotes the expectation over ( Ω , F , P ) . The SVIP generalizes the deterministic VIP by integrating stochastic effects into the equilibrium conditions and provides a rigorous model for decision-making under uncertainty. It encompasses diverse problems in stochastic optimization, energy markets, game theory, and transportation systems (see, e.g., [13,14]).
Related Works: Problem (3) has inspired substantial research activity. When F ( x ) admits a closed-form expression, it reduces to the deterministic VIP (2), for which numerous efficient algorithms exist (see [11]). When F ( x ) is not explicitly computable, two primary strategies are adopted: the sample average approximation (SAA) and the stochastic approximation (SA) approaches. In SAA, the expectation in (3) is replaced with an empirical average based on N independent samples:
F N ( x ) = 1 N i = 1 N f ( x , ξ i ) ,
where { ξ i } are independent and identically distributed (i.i.d.) realizations of ξ since the law of large numbers ensures that F N ( x ) F ( x ) almost surely as N . Many recent studies analyze convergence properties of SAA and its applications to stochastic generalized equations [15], gap-function reformulations [16], and unconstrained settings [4,17,18,19]. In contrast, the SA framework solves (3) by updating iterates using sample-based gradients in an online fashion. The classical Robbins–Monro [5] procedure forms the basis for this approach. A seminal contribution by Jiang and Xu [14] proposed the single-projection SA algorithm:
x k + 1 = Π X ( x k λ k f ( x k , ξ k ) ) , k N 0 , x 0 X ,
where Π X denotes the Euclidean projection onto X, and the stepsize sequence { λ k } satisfies k = 0 λ k = while k = 0 λ k 2 < . Under strong monotonicity and Lipschitz continuity, they proved almost sure (a.s.) convergence to the unique solution of (3). Subsequent improvements have relaxed these assumptions or enhanced convergence properties. Yousefian et al. [20] introduced adaptive step-sizes for Cartesian SVIPs; Koshal et al. [21] proposed parallel and proximal-based methods; and [22] derived asymptotic feasibility and solution rates of O ( 1 / k ) and O ( 1 / k ) , respectively, under the monotonicity assumption. To further weaken assumptions, Yang et al. [23] developed algorithms for pseudomonotone and Lipschitz continuous operators, achieving sublinear convergence and optimal oracle complexity.
Recent research has focused on improving algorithmic efficiency through the extragradient and subgradient extragradient (SEM) methods. The extragradient method, originally due to Korpelevich [24], has been successfully extended to the stochastic setting (see, e.g., [25,26]). A typical extragradient-type update reads:
y k = Π X ( x k γ k f ( x k , ξ k ) ) , x k + 1 = Π X ( x k α k f ( y k , η k ) ) ,
where ξ k and η k are independent sample batches. Although effective, this approach requires two projections per iteration, which can be computationally demanding for large-scale or structured feasible sets. To reduce this cost, subgradient extragradient variants [26,27,28,29] were proposed and analyzed for both deterministic and stochastic VIP, yielding promising results under monotonicity and Lipschitz continuity assumptions.
Furthermore, recent research works have introduced self-adaptive stepsize rules and inertial terms to accelerate convergence. In particular, Wang et al. [30] eliminated the need for linesearch-based parameter tuning while maintaining convergence for pseudomonotone operators and adopted self-adaptive strategy to analysis approximate solution to SA. Liu and Qin [31] later incorporated Polyak’s inertial extrapolation [32] (see, also [33,34,35,36,37]) into stochastic extragradient frameworks, achieving almost sure convergence and improved complexity bounds, though still requiring linesearch conditions.
Motivation and Contribution: Despite these advances, most existing SA-based algorithms rely on strong or pseudomonotone assumptions, limiting their applicability to broader problem classes. Moreover, modern schemes often require multiple projections, sensitive parameter tuning, or complex linesearch procedures, which hinder scalability. Motivated by these challenges and the works in [23,27,30,31,32,38,39], we ask the following fundamental question:
Question: Can we design a robust iterative scheme for solving the SVIP (3) that combines self-adaptive step-sizes, stochastic subgradient extragradient techniques, and inertial acceleration for a quasi-monotone operator within the SA framework while ensuring almost sure and convergence rate are guarantees?
The principal objective of this study is to provide an affirmative answer to this question.
Organization of the Paper: Section 2 presents preliminary definitions and essential lemmas. Section 3 introduces the proposed algorithm and underlying assumptions. Section 4 contains the main convergence analysis and proofs. Numerical results and practical applications are presented in Section 5, followed by concluding remarks in Section 6.

2. Preliminaries

In this section, we formally state some basic terminologies that are essential in this work. For any vectors x , y R n , x , y is the standard inner product, x = x , y is the Euclidean norm. Given ξ a random variable and σ algebra F , the notations E [ ξ ] , E [ ξ | F ] , V ( ξ ) and V ( ξ | F ) denote the expectation of ξ , conditional expectation of ξ with respect to F , the variance of ξ and the conditional variance of ξ with respect to F . For p 1 , | ξ | p : E [ ξ | p ] p is the L p norm of ξ and | ξ | F | p : = E [ ξ | p | F ] p is the L p norm of ξ conditional to F . The σ algebra generated by the random variable { ξ k } i = 1 k is denoted by σ ( ξ 1 , , ξ k ) . Also, E [ . | ξ 1 , , ξ k ] : = E [ . | σ ( ξ 1 , , ξ k ) ] . We say that a random variable ξ is F -measurable. We write ξ F to mean that ξ is independent of the σ -algebra F . The set of natural number is denoted by N . Let x R n , there exists a unique element z X , denoted by Π X ( x ) , such that z x = inf y X y z . The mapping Π X : R n X is called a projection from R n onto X . To quantify the inaccuracy in the stochastic evaluation of (3), introduce the error term
ε ( x , ξ ) : = f ( x , ξ ) F ( x ) , x R n , ξ Ξ .
For any exponent p [ 2 , ) , we associate with this error the p-moment function
σ p ( x ) = E ε ( x , ξ ) p 1 / p ,
which serves as an indicator of how accurately a stochastic approximation method captures the underlying operator.
We give below a fundamental definition regarding the cost function.
Definition 1.
The mapping T on X is called:
(i) 
strongly monotone if, there exists a constant μ > 0 such that for all
x , y X , T x T y , x y μ x y 2 ;
(ii) 
monotone if, for all x , y X , T x T y , x y 0 ;
(iii) 
pseudomonotone if, for all x , y X , T y , x y 0 T x , x y 0 ;
(iv) 
quasi-monotone if, for all x , y X , T y , x y > 0 T x , x y 0 .
Remark 1.
We obtain from Definition 1 that ( 1 ) ( 2 ) ( 3 ) ( 4 ) , but the converse of these statements is not true in general.
The following Lemmas are very important in our work.
Lemma 1
(Lemma 2.1, [30]). Let Π X be a projection from R n onto X . Then,
(i) 
Π X ( x ) z 2 x z 2 Π X ( x ) x 2 , x R n and z X .
(ii) 
2 v , u z y z 2 u z 2 u y 2 .
(iii) 
z = Π X ( x ) if and only if x z , z y 0 , y X .
(iv) 
Let Γ . Then, x Γ x = Π X ( x α F ( x ) ) for all α strictly positive.
Let x ¯ , v R n with v 0 and consider T k = { x R n : v , , x x ¯ 0 } . For any y R n , the projection Π T k ( x ) is given by
Π T k ( y ) = y max 0 , v , y x ¯ | v | 2 v .
Observe that (7) provides a direct formula for computing the projection of an arbitrary point onto a half-space.
Lemma 2
([40]). For any point x R n and any parameter β > 0 , the following bounds are satisfied:
min { 1 , β } r 1 ( x ) r β ( x ) max { 1 , β } r 1 ( x )
where r β ( x ) = x Π X ( x β F ( x ) ) .
Assumption 1
The following assumptions shall be considered:
(A) 
The solution set Γ .
(B) 
(i) For all x , y R n and almost every ϖ Γ ,
f ( x , ξ ( ϖ ) ) f ( y , ξ ( ϖ ) ) L ( ξ ( ϖ ) ) x y ,
    where L : Ξ R + is a measurable function such that L ( ξ ( ϖ ) ) 1 for almost ϖ Γ .
(ii) There exists a R n and p 2 such that E [ f ( a , ξ ) p ] < and E [ L ( ξ ) p ] < .
(C) 
The mapping F is quasi-monotone.
(D) 
Let η k ( 0 , 1 ) and { δ k } a positive sequence such that lim k η k = 0 and k = 0 η k = + . Furthermore, δ k η k 0 as k .
Lemma 3
([41]). Under Assumption 1, the operators F and σ q ( · ) are Lipschitz continuous on R n with constants L and L q , respectively. This holds for every q [ p , 2 p ] , where p is the exponent specified in Assumption 1. Moreover, the constants satisfy L = E [ L ( ξ ) ] and L q = E [ L ( ξ ) q ] q + L . Let ξ = { ξ j } j = 1 N denote an i.i.d. collection drawn from Ξ, and define
G ( x , ξ ) = 1 N j = 1 N f ( x , ξ j ) a n d ε ¯ ( x , ξ ) = 1 N j = 1 N ε ( x , ξ j ) .
Lemma 4
([41]). Suppose that Assumption 1 holds. Then, for any q [ p , 2 p ] with p from Assumption 1, there exists a constant C q > 0 such that for any x R n , x Γ ,
| ε ¯ ( x , ξ ) | q C q σ q ( x ) + L q x x N .
Lemma 5
([41]). Assume that Assumption 1 holds and Γ . Let λ N : Ξ [ 0 , λ ] be a random variable for some 0 < λ 1 . Define z ( x , λ N , ξ ) = Π X ( x λ N G ( x , ξ ) ) . Then, for any p 2 , there exist positive constants { c i } i = 1 4 (depending on n , p , a n d λ ) such that
| ε ¯ ( z ( x , λ N , ξ ) , ξ ) | p c 1 σ 2 p ( x ) + L ¯ 2 p x x N , x R n , x Γ ,
where L ¯ 2 p = c 2 L 2 + c 3 L p + c 4 L 2 p , c 1 > C p and σ 2 p ( x ) σ p ( x ) .
Lemma 6
([42]). Let { V k } k 1 , { δ k } k 1 , { η k } k 1 and { β k } k 1 denote sequences of nonnegative random variables adapted to the filtration { Θ k } k 1 . Suppose that, almost surely, k = 1 δ k < and k = 1 Θ k < , and that
[ V k + 1 Θ k ] ( 1 + δ k ) V k η k + β k , k N .
Then, with probability one, the sequence { V k } converges and k = 1 η k < .

3. Proposed Algorithm

Remark 2.
We highlight the benefits Algorithm 1 as follow:
1. 
In the Step 1 of the proposed algorithm, we incorporate the inertial term θ k ( x k x k 1 ) , also called momentum-based method inspired by Nesterov acceleration and Polyak’s heavy-ball method. It promises faster convergence rates, variance reduction, better stability for ill-conditioned problems, and improved practical performance. It is obvious that without inertia, SA can become stuck in flat regions or plateaus caused by noise. In stochastic games, inertial SA often requires fewer iterations to achieve a desired accuracy. These facts underscore the need for adopting it in the algorithm, and it is an improvement over [5,21,23,25,27,28,30,39,41].
2. 
The algorithm involves the subgradient extragradient (SEG) method, which involves one projection onto the feasible set. It handles non-smooth problems, improves feasibility maintenance, which ensures all iterates remain feasible, a key requirement in constrained stochastic optimization problems. It is, therefore, preferable to those algorithms that involve two projections onto the feasible set per iteration. Hence, it contributes positively to the literature when compared with works in [5,14,15,20,21,22,25,27,33,41,42].
3. 
Since the SA-based algorithm is very sensitive to the stepsize or the step-length, we consider a self-adaptive stepsize that adjusts dynamically, ensuring robustness across problem scales and conditions. In fact, self-adaptive stepsize in SA accelerates convergence, reduces sensitivity to noise, eliminates heavy manual turning, ensures stability, and equally improves efficiency near the solution. Unlike Armijo linesearch methods that consume a large amount of time, thereby affecting the performance of iterative algorithms (see, e.g., [4,13,14,15,17,21,31,33,40,41,42,42]) and cited references contained therein.
4. 
It is known that real-world stochastic systems often have non-symmetric or partially monotone structures. This necessitated the very essence of considering a quasi-monotone operator, which is weaker, so that the proposed scheme can handle nonlinear, asymmetric, or discontinuous mappings more realistically. It is important to note that quasi-monotone operators avoid the need for projection correctness or strong-regularization techniques required for non-monotone problems. To this end, Algorithm 1 offers greater modeling flexibility, wider applicability, reduced assumptions for convergence, and lower computational cost compared to strict monotonicity, monotonicity, and pseudo-monotonicity commonly found in the literature. Therefore, our scheme improves many already announced results in this research direction.
Algorithm 1 Inertial Self-Adaptive Subgradient Extragradient Algorithm
Step 0: Select x 0 , x 1 R n , α 0 > 0 . Take θ [ 0 , 1 ) , ε 0 [ 0 , 1 ) , k N 0 , μ ( 0 , 1 2 ) . Take the sample rate { N k } k 0 with k = 0 1 N k < . Set k = 0 .
Step 1: Given the current iterates x k , x k 1 , ( k 0 ) , construct the inertial term as follows:
w k = x k + θ k ( x k x k 1 ) ,
where
θ k = min { δ k x k x k 1 , θ } i f x k x k 1 , θ , o t h e r w i s e .

Step 2: Draw an i.i.d. sample ξ k = { ξ j k } j = 1 N k from Ξ and compute
y k = Π X ( w k α k G ( w k , ξ k ) ) ,
where G ( w k , ξ k ) = 1 N k j = 1 N k f ( w k , ξ j k ) . If w k = x k = y k , then stop, w k Γ ; otherwise, go to the next step.
Step 3: Consider a constructible set T k = { x R n : w k α k G ( w k , ξ k ) y k , x y k 0 } and calculate
x k + 1 = Π T k ( w k α k G ( y k , ξ k ) ) ,
where G ( y k , ξ k ) = 1 N k j = 1 N k f ( y k , ξ j k ) ,
α k + 1 = min { α k , μ y k w k G ( w k , ξ k ) G ( y k , ξ k ) } , i f G ( w k , ξ k ) G ( y k , ξ k ) , α k , otherwise .
Set  k : = k + 1 and go back to Step 1.

4. Convergence Analysis

In this section, we present the technical proofs for two convergence analyses of the proposed Algorithm 1: almost sure convergence and the rate of convergence. The former establishes pathwise convergence without quantifying the speed of approach, whereas the latter measures the convergence speed rather than probabilistic pathwise certainty. We begin with the proof of almost surely convergence.

4.1. Almost Surely Convergence

Remark 3.   1.  Our investigation of the proposed method will be based on filtration F k .
F 0 = σ ( x 0 ) a n d F ( x 0 , ξ 0 , , ξ k 1 ) k N .
So,
F 0 = σ ( x 0 ) , F 1 = σ ( x 0 , ξ 0 ) , F 2 = σ ( x 0 , ξ 0 , ξ 1 ) ,
Increasingly, F k = σ ( x 0 , ξ 0 , , ξ k 1 ) and F k + 1 = σ ( x 0 , ξ 0 , , ξ k ) . We see clearly that adding ξ k provides more information, so F k F k + 1 . Since x k is a measurable function of ( x 0 , ξ 0 , , ξ k 1 ) . It follows that x k F k .
2. 
From the Algorithm 1, and (7) it follows that for any given x R n ,
x k + 1 = x max 0 , w k α k G ( w k , ξ k ) y k , x y k w k α k G ( w k , ξ k ) y k 2 ( w k α k G ( w k , ξ k ) y k ) ,
provided that w k α k G ( w k , ξ k ) y k 0 . This is one of striking advantage of Algorithm 1. Its execution is merely a single projection onto X , creating efficiency and smooth running of the scheme.
3. 
Let X R n . From Algorithm 1, Step 2, we know that y k = Π X ( w k α k G ( w k , ξ k ) ) i.e., y k is the projection of ( w k α k G ( w k , ξ k ) onto X . Using projection property, we understand that
z y k , x y k 0 , x X ,
where z is the point being projected, i.e., z = ( w k α k G ( w k , ξ k ) . Using the definition of T k from the algorithm, we understand that for all
x X , T k = { x R n : w k α k G ( w k , ξ k ) y k , x y k 0 } .
So, X T k .
4. 
In view of (5), we define ε 1 k = G ( w k , ξ k ) F ( w k ) and ε 2 k = G ( y k , ξ k ) F ( y k ) the oracle errors for all k N . If w k = x k = y k for some k N 0 , then w k Γ .
Indeed, assume that w k = x k = y k for some positive k . We know from Lemma 1 (i) that
y k x 2 = Π X ( w k α k G ( w k , ξ k ) ) x 2 w k α k G ( w k , ξ k ) x 2 w k α k G ( w k , ξ k ) y k 2 = w k α k G ( w k , ξ k ) x 2 α k 2 G ( w k , ξ k ) 2 = y k x 2 2 α k w k x , G ( w k , ξ k ) = y k x 2 2 α k w k x , F ( w k ) 2 α k w k x , ε 1 k .
Noting that α k > 0 , we quickly have that
w k x , F ( w k ) + w k x , ε 1 k 0 , x X .
Indeed, for all x X , { λ k ε 1 k , w k x , F k } defines a martingale difference, i.e., E [ λ k ε 1 k , w k x | F k ] = 0 . This follows from the fact that since w k F k and ξ k F k , we understand that
E [ ε 1 k | F k ] = E [ G ( w k , ξ k ) | F k ] F ( w k ) = E [ G ( x , ξ k ) ] | x = w k F ( w k ) = 0 .
Taking E [ . | F k ] , in (13)
w k x , F ( w k ) + E [ w k x , ε 1 k | F k ] = w k x , F ( w k ) < 0 , x X .
Therefore, w k Γ .
We shall break our main theorem into Lemmas.
Lemma 7.
The limit of { α k } a.s. exists. Let lim k α k = α as k . Then, α min { α 0 , μ L } , where L = E [ L ( ξ ) ] .
Proof. 
By the definition of λ k + 1 in the algorithm, we obtain
λ k + 1 = min λ k , μ y k x k G ( x k , ξ k ) G ( y k , ξ k ) λ k ,
which shows that the sequence { λ k } is monotone nonincreasing, and moreover λ k > 0 for every k.
Hence { λ k } is bounded below by 0 and thus convergent almost surely to a finite limit λ : = lim k λ k 0 .
Now, define
A k : = 1 N k j = 1 N k L ( ξ j k ) ,
where L ( ξ ) is the random Lipschitz modulus from Assumption 1 and { ξ j k } j = 1 N k are the sampled random variables at iteration k.
The Lipschitz-type bound implies that
G ( x k , ξ k ) G ( y k , ξ k ) A k x k y k .
If G ( x k , ξ k ) G ( y k , ξ k ) , then
μ y k x k G ( x k , ξ k ) G ( y k , ξ k ) μ A k .
Thus,
λ k + 1 min { λ k , μ A k } .
Iterating this inequality gives
λ k + 1 min λ 0 , μ A 0 , μ A 1 , , μ A k .
Assume, for contradiction, that
λ < min { λ 0 , μ L } .
Choose η such that
λ < η < min { λ 0 , μ L } .
Since λ k λ almost surely as k , there exists (a.s.) k 0 such that for all k > k 0 , λ k + 1 < η . By definition of λ k + 1 , whenever λ k + 1 < η we must have
μ A k < η A k > μ η .
Hence, for all large k,
E [ A k ] > μ η .
But E [ A k ] = E 1 N k j = 1 N k L ( ξ j k ) = E [ L ( ξ ) ] = L , since the ξ j k are i.i.d. Therefore L > μ η , implying η > μ L , which contradicts η < μ L .
Thus, the assumption is false, and we conclude that
λ min { λ 0 , μ L } .
Remark 4.
The expectation step is justified because A k is the empirical mean of i.i.d. random variables L ( ξ j k ) , so that E [ A k ] = E [ L ( ξ ) ] = L . The contradiction argument uses that if A k > μ / η for all large k, then L = E [ A k ] μ η , contradicting η < μ / L . This gives a transparent probabilistic justification for the bound.
The following Lemma will be needed in the sequel.
Lemma 8.
For any x Γ , the following estimate holds almost surely:
x k + 1 x 2 x k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α 0 ε 2 k y k x + 3 M 2 η k , k N 0 ,
where τ k = 2 μ α k α k + 1 .
Proof. 
Recall from the inertial term step that w k = x k + θ k ( x k x k 1 ) and for any x Γ , using definition of { w k } and { θ k } , we obtain
θ k x k x k 1 δ k , n 1 .
Hence,
θ k η k x k x k 1 δ k η k 0 a s n .
Therefore, there exists M 1 > 0 such that
θ k η k x k x k 1 M 1 , k 1 .
w k x 2 = x k x + θ k ( x k x k 1 ) 2 x k x 2 + θ k 2 x k x k 1 2 + 2 θ k x k x x k x k 1 = x k x 2 + θ k ( x k x k 1 ) 2 x k x + θ k x k x k 1 x k x 2 + 3 θ k x k x k 1 M 0 = x k x 2 + 3 M 0 η k θ k η k x k x k 1 x k x 2 + 3 M 1 M 0 η k = x k x 2 + 3 M 2 η k ,
for some 0 < M 2 = M 0 M 1 .
In view of the fact that Γ X T k holds for all k N 0 , and for each x Γ with Lemma 1 (ii), (20) and for any w k X , we understand that
2 α k G ( y k , ξ k ) , x k + 1 x w k x 2 x k + 1 x 2 x k + 1 w k 2 .
That is,
x k + 1 x 2 w k x 2 x k + 1 w k 2 + 2 α k G ( y k , ξ k ) , x x k + 1 = w k x 2 x k + 1 y k + y k w k 2 + 2 α k G ( y k , ξ k ) , x x k + 1 = w k x 2 x k + 1 y k 2 + y k w k 2 + + 2 x k + 1 y k , y k w k + 2 α k G ( y k , ξ k ) , x x k + 1 x k x 2 x k + 1 w k 2 + y k w k 2 + 2 α k G ( y k , ξ k ) , x y k + 2 x k + 1 y k , w k y k α k G ( y k , ξ k ) + 3 M 2 η k .
Since x k + 1 lies in T k , it follows that
x k + 1 y k , w k y k α k G ( w k , ξ k ) 0 ,
which further implies that
x k + 1 y k , w k y k α k G ( y k , ξ k ) α k x k + 1 y k , G ( w k , ξ k ) G ( y k , ξ k ) , k N 0 .
Indeed, we know that
w k y k α k G ( w k , ξ k ) = ( w k y k α k ) G ( y k , ξ k ) α k ( G ( w k , ξ k ) G ( y k , ξ k ) ) .
Taking inner product with x k + 1 y k , using linearity gives
0 x k + 1 y k , w k y k α k G ( w k , ξ k ) = x k + 1 y k , w k y k α k G ( w k , ξ k ) α k x k + 1 y k , G ( w k , ξ k ) G ( y k , ξ k ) .
Re-arranging the above inequality, we obtain (21). Furthermore, applying Cauchy–Schwartz inequality, utilizing the definition of α k + 1 given in Algorithm 1, one obtains
2 α k x k + 1 y k , G ( w k , ξ k ) G ( y k , ξ k ) 2 α k x k + 1 y k G ( w k , ξ k ) G ( y k , ξ k ) 2 μ α k α k + 1 x k + 1 y k w k y k τ k x k + 1 y k 2 + w k y k 2 .
Since α k + 1 α 0 , substituting (21) into (20), and utilizing (22), we obtain
x k + 1 x 2 w k x 2 ( 1 τ k ) x k + 1 y k 2 ( 1 τ k ) y k w k 2 + 2 α k G ( y k , ξ k ) , x y k = w k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α k ( ε 2 k + F y k ) , x y k = w k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α k ε 2 k , x y k + F y k , x y k = w k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α k ε 2 k , x y k 2 α k F y k , y k x w k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α 0 ε 2 k x y k x k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α 0 ε 2 k x y k + 3 M 2 η k ,
this is true since by definition of quasi-monotonicity assumption on F ( . ) we know that F y k , y k x 0 y k X and α k > 0 k N 0 . This completes the proof of Lemma 8. □
The next Lemma controls the error bound arising from our computations.
Lemma 9.
Assume that Assumption 1 holds. Then, for any x Γ , a.s. we have
| | y k x ε 2 k | F k | p 2 m 3 N k + m 5 N k + m 4 N k x k x 2 , k N 0 .
Proof. 
Nothing that x = Π X ( x α k F ( x ) ) , utilizing the nonexpansivity of Π X , the Lipschitz continuity of F and noting that α k α 0 , we obtain
y k x = Π X ( w k α k G ( w k , ξ k ) ) Π X ( x α k F ( x ) ) w k x α k ( G ( w k , ξ k ) F ( w k ) ) = w k x α k ( F ( w k ) + ε 1 k F ( x ) ) w k x + α k F ( w k ) F ( x ) + ε 1 k w k x + α 0 L w k x + α 0 ε 1 k , = ( 1 + α 0 L ) w k x + α 0 ε 1 k , k N 0 .
Now, applying Lemmas 4 and 5, and (25), one obtains
| | y k x ε 2 k | F k | p 2 ( 1 + L α 0 ) w k x | ε 2 k | p 2 + α 0 | ε 1 k ε 2 k | F k | p 2 ( 1 + L α 0 ) w k x | ε 2 k | F k | p 2 + α 0 | ε 1 k | F k | p × | ε 2 k | F k | p ( 1 + L α 0 ) w k x c 1 σ 2 p ( x ) + L ¯ 2 p w k x N k + α 0 ( c 1 σ 2 p ( x ) + L ¯ 2 p w k x ) 2 N k ( 1 + L α 0 ) w k x 2 + c 1 σ 2 p ( x ) + L ¯ 2 p w k x 2 N k + α 0 c 1 σ 2 p ( x ) + L ¯ 2 p w k x 2 N k ( 1 + L α 0 ) w k x 2 + 2 ( c 1 σ 2 p ( x ) ) 2 + 2 L ¯ 2 p 2 w k x 2 N k + 2 α 0 ( c 1 σ 2 p ( x ) ) 2 + L ¯ 2 p 2 w k x 2 N k = 2 ( 1 + α 0 ( 1 + L ) ) ( c 1 σ 2 p ( x ) ) 2 N k + ( 1 + L α 0 ) ( 1 + 2 L ¯ 2 p 2 ) + 2 α 0 L ¯ 2 p 2 N k w k x 2 = m 3 N k + m 4 N k w k x 2 ,
where m 3 = 2 ( 1 + α 0 ( 1 + L ) ) ( c 1 σ 2 p ( x ) ) 2 , m 4 = ( 1 + L α 0 ) ( 1 + 2 L ¯ 2 p 2 ) + 2 α 0 L ¯ 2 p 2 with c 1 , and L ¯ 2 p as defined in Lemma 5.
Utilizing (19), and combining with (26), we understand that
| | y k x ε 2 k | F k | p 2 m 3 N k + m 4 N k w k x 2 m 3 N k + m 4 N k x k x 2 + 3 M 2 η k = m 3 N k + m 4 ( 3 M 2 η k ) N k + m 4 N k x k x 2 = m 3 N k + m 5 N k + m 4 N k x k x 2 ,
where m 5 = 3 m 4 M 2 η k and this completes the proof of Lemma 9 □
We are now ready to provide the main theorem of this paper.
Theorem 1.
Suppose that Assumption 1 holds. Then, the sequence { x k } generated by Algorithm 1 a.s. converges to a point x Γ .
Proof. 
We know from Lemma 7 that the limit of { α k } exists, noting that { α k } is nonincreasing, α k α for all k N 0 , where lim n α k = α . Utilizing the definition { y k } given in Algorithm 1,the Oracle error, ε 1 k = G ( w k , ξ k ) F ( w k ) and applying Lemmas 2 and 3, we understand that
( min { 1 , α } r 1 ( w k ) ) 2 ( min { 1 , α k } r 1 ( w k ) ) 2 ( r α k ( w k ) ) 2 = w k Π X ( w k α k F ( w k ) ) 2 2 w k y k 2 + 2 y k Π X ( w k α k F ( w k ) ) 2 = 2 w k y k 2 + 2 Π X ( w k α k ( F ( w k ) + ε 1 k ) ) Π X ( w k α k F ( w k ) ) 2 2 w k y k 2 + α 0 2 ε 1 k 2 .
It follows from the above estimate that
w k y k 2 α ¯ ( r 1 ( w k ) ) 2 2 α 0 2 ε 1 k 2 ,
where α ¯ = min { 1 , α } . Utilizing Lemma 4, and setting p = 2 , we obtain that
| ε 1 k 2 | F k | = | ε 1 k | F k | 2 c 1 σ 4 ( x ) + L ¯ 4 w k x N k 2 2 ( c 1 σ 4 ( x ) ) 2 + 2 ( L ¯ 4 w k x ) 2 N k 2 ( c 1 σ 4 ( x ) ) 2 + 2 ( L ¯ 4 w k x ) 2 N k .
Recall that in Lemma 9 τ k = 2 μ α k α k + 1 . It follows from Lemma 8 that lim n τ k = 2 μ ( 0 , 1 ) . On the other hand, since τ k 2 μ for all k N 0 , we can find some τ ( 0 , 1 ) and an index K 0 N such that τ k [ 2 μ , τ ) for every k K 0 . Taking these observations into account, and recalling that 1 τ > 0 , we deduce from Lemma 8, (28), and (29) that
x k + 1 x 2 w k x 2 ( 1 τ ) α 2 ( r 1 ( w k ) ) 2 2 + α 0 ε 2 k y k x + ( 1 τ ) α 0 2 ε 1 k 2 k > K 0 .
Taking E [ . | F k ] in (30) and noting p = 2 , and applying (19), we obtain
E [ x k + 1 x 2 | F k ] w k x 2 ( 1 τ ) α ¯ 2 ( r 1 ( w k ) ) 2 2 + ( 1 τ ) α 0 2 E [ ε 1 k 2 | F k ] + α 0 E [ ε 2 k y k x ] w k x 2 ( 1 τ ) α ¯ 2 ( r 1 ( w k ) ) 2 2 + ( 1 τ ) α 0 2 ( 2 c 1 σ 4 ( x ) ) 2 + 2 L ¯ 4 2 w k x 2 N k + 2 α 0 m 4 N k w k x 2 + m 3 N k = 1 + 2 α 0 m 4 + 2 α 0 2 L 4 2 ( 1 τ ) N k w k x 2 ( 1 τ ) α ¯ 2 ( r 1 ( w k ) ) 2 2 + 2 ( 1 τ ) α 0 2 ( c 1 σ 4 ( x ) ) 2 + α 0 m 3 N k 1 + 2 α 0 m 4 + 2 α 0 2 L 4 2 ( 1 τ ) N k x k x 2 + 3 M 2 η k ( 1 τ ) α ¯ 2 ( r 1 ( w k ) ) 2 2 + 2 ( 1 τ ) α 0 2 ( c 1 σ 4 ( x ) ) 2 + α 0 m 3 N k = 1 + 2 α 0 m 4 + 2 α 0 2 L 4 2 ( 1 τ ) N k x k x 2 + 3 M 2 η k 1 + 2 α 0 m 3 + 2 α 0 2 L 4 2 ( 1 τ ) N k ( 1 τ ) α ¯ 2 ( r 1 ( w k ) ) 2 2 + 2 ( 1 τ ) α 0 2 ( c 1 σ 4 ( x ) ) 2 + α 0 m 3 N k , k K .
Setting
Let V k = x K + k x 2 and set G k = F K + k for all k N ,
m 6 = 2 α 0 m 4 + 2 α 0 2 L 4 2 ( 1 τ ) ,
m 7 = 2 ( 1 τ ) α 0 2 ( c 1 σ 4 ( x ) ) 2 + α 0 m 4 , it follows from (31) that
E [ V k + 1 | G k ] 1 + m 6 N k V k ( 1 τ ) α ¯ 2 ( r 1 ( w k ) ) 2 2 + 3 M 0 a k 1 + m 6 N k + m 7 N k , k N .
Since k = 0 1 N k < , we conclude from Lemma 6 and (32) that a.s. the sequence { V k } is convergent and k = 1 ( r 1 ( w k ) 2 ) < .
Thus, a.s. the sequence { x k } is bounded and
lim k ( r 1 ( w k ) 2 ) = lim k w k Π X ( w k F ( w k ) ) 2 = 0 .
By virtue of a.s. boundedness of { x k } , we can find a subsequence { x k j } of { x k } that a.s. converges to a point x . We understand from this fact that
lim j ( r 1 ( w k j ) 2 ) = lim j w k j Π X ( w k j F ( w k j ) ) 2 = x Π X ( x F ( x ) 2 = 0 ,
which shows that x Γ .
Note further that, since the limit of x k x exists almost surely for every x Γ , it follows that
lim k x k x 2 = lim j x k j x 2 = 0 .
This establishes the claim and thus completes the proof of Theorem 1. □

4.2. Rate of Convergence

We provide the most insightful part of the proposed algorithm, which describes how quickly the recursive sequence { x k } approaches its limit x . Moving forward from here, we consider the following Lemma, which is needed in the sequel.
Lemma 10.
Under Assumption 1, we have
k = 0 E [ x k + 1 y k 2 + y k w k 2 ] < .
Proof. 
From the previous estimate, we know that, from using the uniform bound on τ k and the previous estimates, we obtain
x k + 1 x 2 w k x 2 ( 1 τ ) x k + 1 y k 2 + y k w k 2 + 2 α 0 ε 2 k y k x .
The above information can be recast as
( 1 τ ) x k + 1 y k 2 + y k w k 2 w k x 2 x k + 1 x 2 + 2 α 0 ε 2 k y k x x k x 2 x k + 1 x 2 + 3 M 2 η k + 2 α 0 ε 2 k y k x , k N 0 .
Having established the a.s. boundedness of { x k x 2 } in Theorem 1, we assign P 0 = sup k N 0 x k x 2 . Now by setting p = 2 , utilizing Lemma 4, (27) and taking E [ . ] in (33), we obtain
( 1 τ ) E [ x k + 1 y k 2 + y k w k 2 ] E [ x k x 2 ] + E 3 M 2 [ η k ] + 2 α 0 E [ ε 2 k y k x ] = E [ x k x 2 ] E [ y k + 1 x 2 ] + 2 α 0 E [ E [ ε 2 k y k x | F k ] ] + 3 M 2 E [ η k ] E [ x k x 2 ] E [ x k + 1 x 2 ] + 3 M 2 E [ η k ] + 2 α 0 E m 3 N k + m 5 N k + m 4 N k x k x 2 = E [ x k x 2 ] E [ x k + 1 x 2 ] + 2 α 0 m 3 N k + m 5 N k + m 4 N k P 0 + P ,
where p is the sup k N 0 3 M 2 η k . Taking a sum over k recursively in (34), we obtain
( 1 τ ) k = K = 1 E [ x k + 1 y k 2 + y k w k 2 ] E [ x K + 1 x 2 ] + 2 α 0 k = K + 1 m 3 N k + m 5 N k + m 4 N k P 0 < P 0 + 2 α 0 k = 0 m 3 N k + m 5 N k + m 4 N k P 0 + p < .
That is,
k = 0 E [ x k + 1 y k 2 + y k w k 2 ] < ,
which completes the proof of Lemma 10. □
We now state the following theorem for the rate of convergence. In this setting, the cost function satisfies a strong pseudo-monotonicity property, meaning that
F ( x ) , y z 0 F ( y ) , y x λ x y 2 , x , y X .
Theorem 2.
Assume that the Assumption 1 is satisfied. Then, the following condition holds
min k = 0 , 1 , , m E [ w k y k 2 ] 1 m + 1 x 0 x 2 + P 1 ,
where P 1 is a well-defined positive constant.
Proof. 
Consider τ ¯ = 2 μ α 0 α , where α is given as the limit of { α k } . Noting that a.s. α k [ α , α 0 ] , we obtain a.s. that τ k = 2 μ α α k + 1 τ ¯ for all k N 0 . It is known from Remark 3 (iv) that w k Γ if w k = x k = y k , for some k N .
From Lemma 3, we can observe that if
k = 1 y k w k 2 ,
then necessarily,
lim k w k y k 2 = 0 , a . s . lim k w k y k = 0 , a . s .
Therefore, to establish the convergence rate of Algorithm 1, it is sufficient to analyze the rate at which the sequence { w k y k } converges. Building on this observation, and applying Lemma 7 together with the previously defined quantity τ ¯ , we obtain
x k + 1 x 2 w k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 + 2 α 0 ε 2 k y k x w k x 2 ( 1 τ ¯ ) x k + 1 y k 2 + y k w k 2 + 2 α 0 ε 2 k y k x .
We extract from (35) that
x k + 1 y k 2 + y k w k w k x 2 x k + 1 x 2 + τ ¯ y k w k + x k + 1 y k 2 + 2 α 0 ε 2 k y k x , k N .
Applying Lemma 9, taking E [ . ] in (36) and setting p = 2 , one obtains
E [ y k w k 2 ] E [ x k + 1 y k 2 + y k w k 2 ] E [ w k x 2 ] E [ x k + 1 x 2 ] + τ ¯ E [ x k + 1 y k 2 + y k w k 2 ] + 2 α 0 E [ ε 2 k y k x ] = E [ w k x 2 ] E [ x k + 1 x 2 ] + τ ¯ E [ x k + 1 y k 2 + y k w k 2 ] + 2 α 0 E [ E [ ε 2 k y k x | F k ] ] E [ x k x + 3 M 2 η k ] E [ x k + 1 x 2 ] + τ ¯ E [ x k + 1 y k 2 + y k w k 2 ] + 2 α 0 m 3 N k + m 5 N k + m 4 N k x k x 2 .
Now, taking sum from k = 0 to m in (37), we obtain
k = 0 m E [ y k w k 2 ] x 0 x 2 E [ x m + 1 x 2 ] + τ ¯ k = 0 m E [ x k + 1 y k 2 + y k w k 2 ] + 2 α 0 k = 0 m m 3 N k + m 5 N k + m 4 N k P 0 + 3 M 2 k = 0 m E [ η k ] x 0 x 2 + Q ,
Q = 2 α 0 k = 0 m m 3 N k + m 5 N k + m 4 N k P 0 + τ ¯ k = 0 m E [ x k + 1 y k 2 + y k w k 2 ] + 3 M 0 k = 0 m E [ η k ] .
Now, using the bound
min k = 0 , , m E [ y k w k 2 ] 1 m + 1 k = 0 m E [ y k w k 2 ] ,
and recalling that Q is finite (see Lemma 10), it follows from (37) together with
min k = 0 , , m E [ y k w k 2 ] 1 m + 1 k = 0 m E [ y k w k 2 ] ,
that the asserted estimate holds. □
The next lemma will be instrumental in establishing the convergence rate of Algorithm 1.
Lemma 11.
Let S k = i = 1 k c k i + 1 b i , where b > 1 is a fixed constant, k N , and { c k } is a positive sequence satisfying k = 1 c k < . Then lim k S k = 0 .
Proof. 
To establish this result, it suffices to show that R m = k = 1 m S k for each m N is bounded. Indeed,
R m = k = 1 m S k = c 1 1 b + + 1 b m + c 2 1 b + + 1 b m 1 + + c m 1 1 b + 1 b 2 + c m b < c 1 i = 1 1 b i + c 2 i = 1 1 b i + + c m 1 i = 1 1 b i + c m i = 1 1 b i = 1 b 1 k = 1 m c k < 1 b 1 k = 1 c k m N .
It then follows that k = 1 S k < . Consequently, we obtain lim k S k = 0 .
We now present the theorem that characterizes the convergence rate of Algorithm 1.
Theorem 3.
Assume that Assumption 1 is satisfied, the cost function is strongly pseudomonotone on C, and let x Γ . Then there exists a positive integer K such that
E [ x k + 1 x 2 ] 1 ( 1 + φ ) k K + 1 t K + S k , k K ,
where t K > 0 and φ > 0 are appropriately chosen constants, and S k is defined in Lemma 11.
Proof. 
Using the definition of strongly pseudomonotone and our estimate in (23), we obtain
x k + 1 x 2 w k x 2 ( 1 τ k ) y k x k + 1 2 + w k y k 2 2 λ α k y k x 2 + 2 α 0 ε 2 k y k x 2 x k x 2 ( 1 τ k ) x k + 1 y k 2 + y k w k 2 2 λ α k y k x 2 + 3 M 2 η k + 2 α 0 ε 2 k y k x , k N 0 .
Noting that 2 p q 1 2 p 2 + 2 q 2 , τ k τ , α k α = lim k α k and from (39), we obtain for all k K
x k + 1 x 2 x k x 2 + 3 M 2 η k ( 1 τ ) x k + 1 y k 2 ( 1 τ ) y k y k 1 + y k 1 w k 2 2 λ α y k x 2 + 2 α 0 ε 2 k y k x x k x 2 ( 1 τ ) x k + 1 y k 2 ( 1 τ ) y k y k 1 2 + y k 1 w k 2 + 2 y k y k 1 , y k 1 w k λ α y k x 2 + 3 M 2 η k + 2 α 0 ε 2 k y k x x k x 2 ( 1 τ ) x k + 1 y k 2 ( 1 τ ) y k y k 1 2 + y k 1 w k 2 + 3 M 2 η k + 2 ( 1 τ ) y k y k 1 y k 1 w k 2 λ α y k x 2 α 0 ε 2 k y k x x k x 2 ( 1 τ ) x k + 1 y k 2 ( 1 τ ) y k y k 1 2 + y k 1 w k 2 2 λ α y k x 2 + 3 M 2 η k + 2 α 0 ε 2 k y k x + 1 τ 2 y k y k 1 2 + 2 ( 1 τ ) y k 1 w k 2 = x k x 2 ( 1 τ ) x k + 1 y k 2 ( 1 τ ) 1 1 2 y k y k 1 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 + 3 M 2 η k λ α ε 2 k y k x .
From (40) we obtain
x k + 1 x 2 + ( 1 τ ) ( 2 1 ) x k + 1 y k 2 x k x 2 ( 1 τ ) ( 2 2 ) x k + 1 y k 2 λ α y k x 2 + 3 M 2 η k ( 1 τ ) 1 1 2 y k y k 1 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 + 2 λ 0 ε 2 k x y k , k K .
Let j = 2 λ α + ( 1 τ ) ( 2 2 ) b , where b = 2 λ α ( 2 1 ) ( 1 τ ) . Noting that τ ( 0 , 1 ) , we then conclude that b ( 0 , 2 λ α ) . Thus, j > 0 . Setting f ( t ) = 2 λ α t 2 j t b , t R , it follows that f ( t ) = 0 has both positive and negative root. Take γ = j + j 2 + 16 λ 2 α 2 ( 2 1 ( 1 τ ) ) 4 λ α . Noting f ( γ ) = 0 and f ( 1 ) 0 , we conclude that γ > 0 .
For all k N 0 , we observe that
y k x 2 = y k x k + 1 + x k + 1 x 2 = y k x k + 1 2 x k + 1 x 2 2 y k x k + 1 , x k + 1 x x k + 1 y k 2 x k + 1 x 2 + γ x k + 1 y k 2 + 1 γ x k + 1 x 2 = ( γ 1 ) x k + 1 y k 2 + 1 γ 1 x k + 1 x 2 .
Substituting (42) into (41), k K , we have
1 2 λ α 1 γ 1 x k + 1 x 2 + ( 1 τ ) ( 2 1 ) y k x k + 1 2 x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ( 1 τ ) 1 1 2 y k y k 1 2 + 2 λ α ( γ 1 ) 2 λ α 1 γ 1 ( 1 τ ) ( 2 1 ) ( 1 τ ) ( 2 2 ) x k + 1 y k 2 + 2 α 0 ε 2 k y k x + 3 M 2 η k = x k x 2 + ( 1 τ ) ( 2 1 ) y k + 1 w k 2 ( 1 τ ) ( 2 1 ) 2 y k y k + 1 2 + f ( γ ) γ x k + 1 y k 2 + 2 α 0 ε 2 k y k x + 3 M 2 η k = x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ( 1 τ ) ( 2 1 ) 2 y k y k 1 2 + 3 M 2 η k + 2 α 0 ε 2 k y k x x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 + 3 M 2 η k + 2 α 0 ε 2 k y k x .
Setting δ = 2 λ α ( 1 1 γ ) . Clearly, δ > 0 , and by taking E [ . ] in (43) and applying Lemma 9 with p = 2 , we obtain
( 1 + δ ) E [ x k + 1 x 2 + ( 1 τ ) ( 2 1 ) x k + 1 y k 2 ] E [ x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ] + 2 α 0 E [ ε 2 k y k x ] + 3 M 2 E [ η k ] = E [ x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ] + 2 α 0 E [ E [ ε 2 k y k x | F k ] ] + 3 M 2 E [ η k ] = E [ x k x 2 ] + ( 1 τ ) ( 2 1 ) y k 1 w k 2 + 2 α 0 m 3 + m 5 η k + m 4 x k x 2 N k + 3 M 2 E [ η k ] E [ x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ] + 2 α 0 ( m 3 + m 5 + m 4 P 0 ) N k + 3 M 2 E [ η k ] = E [ x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ] + p + M 8 N k , k K ,
where M 8 = 2 α 0 ( m 3 + m 5 + m 4 P 0 ) , P 0 = sup k N x k x 2 . Define
β k = E [ x k x 2 + ( 1 τ ) ( 2 1 ) y k 1 w k 2 ] .
From the definition of β k and (44), one obtains
E [ x k + 1 x 2 ] β k 1 + δ + 1 1 + δ p + M 8 N k β K ( 1 + δ ) k K + 1 + i = K k p ( 1 + δ ) i K + 1 + M 8 ( 1 + δ ) i K + 1 N k 1 + K β K ( 1 + δ ) k K + 1 + i = 1 k p ( 1 + δ ) i + M 8 ( 1 + δ ) i N k i + 1 = β K ( 1 + δ ) k K + 1 + S k , k K ,
where S k = i = 1 k p ( 1 + δ ) i + M 8 ( 1 + δ ) i N k i + 1 . Taking b = 1 + δ and c k i + 1 with 1 N k i + 1 as in Lemma 8, we obtain S k 0 as k . This completes the proof of Theorem 3. □

5. Applications and Numerical Illustrations

System Set Up: All experiments were executed on a 64-bit Windows machine powered by an Intel(R) Core(TM) i7-6600U CPU @ 2.60 GHz (2 cores, 4 threads) with 8 GB RAM. Python 3.9 environment was used for numerical computation, data analysis, and visualization with the help of some essential python libraries like: NumPy, SciPy Pandas, and Matplotlib.
We consider four experiments in this research and compare the performance of our proposed algorithm with the existing ones (see Table 1). In particular, Algorithm 1 of Nwawuru et al. will be compared with Wang et al. 2022 (see, (Algorithm 3.1, [30])), Liu and Qin, 2024 (see (Algorithm 1, [31])), Li et al. 2023 (see (Algorithm 3, [43])) and Long and He, 2023 (see (Algorithm 1, [38])). The comparison relates to the CPU time by averaging across 20 sample paths.
Table 1. Comparison of the algorithms considered against the proposed scheme and their main features.
We use numpy.random.rand() and numpy.random.randn() in Python to generate i . i . d . samples from Ξ and terminate the algorithms when the total number of iterations reaches 250. Furthermore, we set N k = N [ ( k + λ ) ( I n ( k + λ ) ) 105 ] with λ = 5 and N = 250 for all the selected algorithms. Moreso, we take γ = 0.05 , θ = 0.5 , μ = 0.5 .
We consider the following examples.
Example 1.
For x 0 , E [ G ( x , ξ ) ] 0 , x , E [ G ( x , ξ ) ] = 0 , where G ( x , ξ ) : = D ( x , ξ ) + M ( ξ ) x + q ( ξ ) with D i ( x , ξ ) : = d i ( ξ ) a r c t a n ( a i ( ξ ) x i ) i = 1 n , d ( ξ ) and a ( ξ ) are randomly generated from uniform distribution on [ 0 , 1 ] , M ( ξ ) : = B + Y ( ξ ) with B R n × n to be a deterministic skew symmetric matrix generated from uniform distribution on [ 0 , 3 ] , Y ( ξ ) R n × n to be a diagonal matrix generated from uniform distribution on [ 0 , 2 ] and q ( ξ ) R n is randomly generated from uniform distribution on [ 2 , 2 ] .
Figure 1 below demonstrate of the performance of Algorithm 1, Wang et al. [30], Liu and Qin [31], Li et al. [43], and Long and He [38].
Figure 1. Comaprsion for N = 50 , 100 , 150 , 200 , 250 (Algorithm 1,30,31,38,43]).
Table 2 below shows all algorithms converging across sample sizes. Algorithm 1 achieve balanced performance, requiring fewer iterations than Wang et al. [30] and Liu and Qin [31], while maintaining a competitive speed similar to Li et al. [43]. Its strength lies in combining stability with efficiency, offering reliable convergence under varying sample sizes. However, it should be noted that Long and He [38] achieve the fastest. Their algorithm is non-monotone, Lipschitz continuous with one oracle call per iteration, while Algorithm 1 have two oracle calls per iteration (see [38], Remark 1(i), Table 1).
Table 2. Convergence summary for the five algorithms across different sample sizes N.
Example 2. (Network Bandwidth allocation) We consider a communication network in which individual users, acting selfishly, compete for shared bandwidth resources. The set of all users in a network is indexed by S : = { 1 , 2 , , m } . Noting that each user can access multiple routes, one assumes that Φ ( s ) is the set of routes governed by users s S . For s S , let ϑ ( s ) denotes the number of elements in Φ ( s ) and Θ : = s S ϑ ( s ) . Let x s r ( r Φ ) ( s ) be the flow rate for users s through which route r goes. The set of all links is denoted by L : = { 1 , 2 , , κ . } . The set of routes is denoted by R : = { 1 , 2 , , t } . For d L , one assumes that L s ( t ) is the set of links through which route t Φ ( s ) goes. Suppose that
X s , d t = 1 , i f d L s ( t ) ; 0 , o t h e r w i s e .
When the flow rate is allocated to a user participating in a network, it derives a utility modeled as the value of a concave function. The utility function of each user is parametrized by the uncertainty, which is defined by
f s ( x s , ξ s ) : = r ϑ ( s ) ξ s ( r ) I n ( 1 + x s r ) , s S ,
where x s : = ( x s ( r ) ) r Φ ( s ) is the flow rate decision vector of users, ξ s r is the random weighted parameter route r Φ ( s ) , and ξ s : = ( ξ s r ) r Φ ( s ) . The flow rate allocated to each user is regulated with a control mechanism to prevent network congestion in the bandwidth allocation. Such a mechanism ensures that the sum of the transmission rate for users sharing the link ν L is less than or equal to the limited capacity of the link t , that is,
X ν : = ( ( x s ( r ) ) r ϑ ( s ) ) s S R + Θ : s S x s ( r ) X s , d t c ν , ,
where c t > 0 is the capacity of link t L . Set c : = ( c t ) t L .
We now formulate this model as a stochastic optimization problem given by
m a x i m i z e E [ f s ( x s , ξ s ) ] s u b j e c t t o x : = ( ( x s ( r ) ) r Φ ( s ) ) s S t L X t ,
Let M denote the adjacency matrix that describes the correlation between the set of links L and the set of routes R : = s S ( ϑ ( s ) ) . We assumes that M t , r : = 1 if route r R goes through link ν L , otherwise 0 . We observe that problem (46) can be captured by an SVI in a compact form:
Find x X : = ν L X t , such that
F ( x ) , x x 0 ,
for any x t L , where
F ( x ) : = s S E [ f s ( x s , ξ s ) ] = E [ ξ s r ] 1 + x s ( r ) s S ,
and
X : = { ( ( x s ( r ) ) r ϑ ( s ) ) s S R + Θ : M ( x ) c } .
Now, we consider the bandwidth allocation problem on a network that consists of 10 nodes, 10 links, and 2 users. It is observed that S : = { 1 , 2 } , L : = { 1 , 2 , , 10 } , R : { 1 , 2 , , 5 } , Φ ( 1 ) : = { 1 , 2 } and Φ ( 2 ) = { 1 , 2 , 3 } .
In this experiment, the weighted parameters ξ s r ( s S , r ϑ ( s ) ) are i.i.d. and drawn randomly from the uniform distribution according to Table 1. Beside, the links have limited capacity c : = ( c ν L ) with the values c 1 , , c 5 given Table 2. One can check that the involved operator F ( . ) in (47) is quasi-monotone and Lipschitz continuous in X with the constant L ( ξ ) : = max s S , r ϑ ( s ) ξ s r + 2 MM T .
Figure 2 below shows a network of 10 nodes, 10 links, and 5 routes connecting two users. Core links carry higher capacities to support aggregate flows, while edge links act as potential bottlenecks. Shared routes between users highlight congestion risks, emphasizing the importance of efficient bandwidth allocation and resource management.
Figure 2. The bandwidth allocation network structure.
Table 3 presents uniform distributions of random parameters for each user’s routes. These ranges capture uncertainty in flow performance, modeling variability in network conditions, and ensuring fairness during bandwidth allocation.
Table 3. Uniform distribution of randomly generated parameters ξ s r for each user-route pair.
Table 4 below reveals how different links are utilized in optimized flow allocation. Heavily loaded links like 4 and 8 should be carefully monitored or upgraded in real networks, while lightly loaded links like 6 and 7 might indicate insufficient routing or a redundant path. NB:Mbps means megabits per second and Gbps means gigabits per second.
Table 4. Link capacities c ν (sample values). Units: flow units (replace with Mbps/Gbps as needed).
Example 3. (Networked stochastic Nash–Cournot game) In this experiment, we consider a networked Nash–Cournot game adopted in [44] under uncertainty data, in which the cost-minimizing agents compete in quantity levels when facing a price function associated with aggregate output. Suppose that there are Ω firms that compete over a network of Λ nodes in supplying a homogeneous product in a non-cooperative sense. Let x i , j denote the level of sales of firm i [ Ω ] and at a node j [ Λ ] . Assume that the firm is characterized by a random linear cost function c i ( X i , ξ i ) = ( a i , ξ i ) j [ Λ ] X i , j for some parameters a i > 0 i , where ξ i is a mean zero random variable. We assume that the price at node j represented by P j ( i [ Ω ] x i , j , η j ) is a stochastic linear function corrupted by noisy P j ( i [ Ω ] X i , j , η j ) = d j + η j b j i [ Ω ] q i , j , where d j indicates the price when then production is zero, b j is the slope at the inverse demand function while η j is a zero-mean random disturbance. Assume that the transport cost is zero. Except for non-negativity constraints in x i , j , we suppose that the firms i s production at node j is capacitated by c a p i , j . We now transform the firm i s into stochastic optimization problem given below:
min E [ f i ( x , ξ , η ) ] = E c i ( X i , ξ i ) j [ Λ ] P j i [ Ω ] X i , j , η j X i , j s . t . x i X i = { x i R Λ : x i > 0 , x i , j c a p i , j }
Under some dominated conditions, when we interchange the others of expectation and derivative, the above stochastic Nash–Cournot game may be transformed into problem (3) with X = i = 1 Ω X i and F ( x ) = ( F 1 ( x ) ; ; F Ω ( x ) ) with F i ( x ) = E [ x i f i ( x , ξ , η ) ] . Just like in [44], we consider a network with Ω = 20 firms, Λ = 10 markets and the capacity C a p i , j = 2 for each Ω and each j Λ . In the experiment, the parameters in the payoffs were set as d U ( 40 , 50 ) , b j U ( 1 , 2 ) , a i ( 3 , 5 ) for all i [ Ω ] and j [ Λ ] , where U ( u 1 , u 2 ) denotes the uniform distribution over an interval ( u 1 , u 2 ) , where u 1 < u 2 . For the random data in the model, we assume ξ i U ( a i 5 , a i 5 ) and η j U ( b j 5 , b j 5 ) , x 0 = x 1 = ( 1 , , 1 ) T for all the algorithm.
Figure 3 shows performance comparison of algorithms across convergence behavior, computational efficiency, and underlying network structure.
Figure 3. Networked stochastic Nash–Cournot game. (a) Empirical gap-function error, the error decreases across all algorithms, with Algorithm 1 converging fastest, Li et al. [43] closely following, while Liu and Qin [31], Long and He [38], and Wang et al. [30] converge slower. (b) Logarithm of empirical error, the plot shows faster convergence for Algorithm 1, while Wang et al. [30] lags behind, confirming differences in algorithmic efficiency. (c) CPU time, Algorithm 1 requires the least computation, reflecting efficiency, while Wang et al. [30] consumes the most, indicating higher computational overhead. (d) Network structure, twenty firms are interconnected with ten markets, highlighting competitive supply interactions and capacity-constrained distribution across nodes.
Table 5 below highlights uniform link capacities of two across all firm–market connections, while market prices vary stochastically through demand parameters. This structure reflects balanced competition, ensuring equal production opportunities for all firms, while random price variations across markets capture uncertainty and heterogeneity in the Nash–Cournot game environment.
Table 5. Capacity with respect to price, market, and firm.

6. Conclusions

This work introduced an adaptive inertial stochastic projection framework for solving stochastic variational inequalities whose cost function is quasi-monotone, and the developed scheme was applied to stochastic complementary problems, networked stochastic Nash–Cournot game, and bandwidth allocation problem. By combining inertia with adaptive stepsize selection, the method accelerates convergence while preserving robustness under uncertainty. Stochastic projections ensure feasibility with link capacity constraints, preventing congestion and promoting fairness. Numerical experiments demonstrate superior efficiency, scalability, and stability compared to conventional techniques. Beyond bandwidth allocation, the approach offers a versatile tool for stochastic optimization problems in uncertain environments. In a nutshell, the algorithm delivered a resilient and computationally efficient solution, advancing both theory and practice in modern network resource management.

Author Contributions

F.O.N., J.N.E., and M.D., conceptualized the research idea, with all three contributing significantly to the manuscript’s writing and revision. F.O.N., and I.A.-D. carried out the computations and established the appropriate convergence analysis, proving almost surely the rate of convergence. F.O.N. and J.N.E. performed the numerical experiments, created figures and tables, and analyzed performance data. All authors discussed the results and provided critical feedback, which improved the quality of the manuscript. F.O.N., M.D., and I.A.-D. coordinated the research activities and ensured integration of all contributions. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2502).

Data Availability Statement

There were no data analyzed in this project.

Acknowledgments

The authors are grateful to the three anonymous reviewers for their constructive comments and valuable suggestions, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no competing interest.

References

  1. Royset, J.O. Risk-adaptive approaches to stochastic optimization: A survey. SIAM Rev. 2025, 67, 3–70. [Google Scholar] [CrossRef]
  2. Liang, H.; Zhuang, W. Stochastic modeling and optimization in a microgrid: A survey. Energies 2014, 7, 2027–2050. [Google Scholar] [CrossRef]
  3. Sclocchi, A.; Wyart, M. On the different regimes of stochastic gradient descent. Proc. Natl. Acad. Sci. USA 2024, 121, e2316301121. [Google Scholar] [CrossRef]
  4. Dong, D.; Liu, J.; Tang, G. Sample average approximation for stochastic vector variational inequalities. Appl. Anal. 2023, 103, 1649–1668. [Google Scholar] [CrossRef]
  5. Robbins, H.; Monro, S. A stochastic approximation method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
  6. Farh, H.M.H.; Al-Shamma’a, A.A.; Alaql, F.; Omotoso, H.O.; Alfraidi, W.; Mohamed, M.A. Optimization and uncertainty analysis of hybrid energy systems using Monte Carlo simulation integrated with genetic algorithm. Comput. Electr. Eng. 2024, 120, 109833. [Google Scholar] [CrossRef]
  7. Stampacchia, G. Formes bilinéaires coercitives sur les ensembles convexes. Comptes Rendus l’Académie Sci. Série A 1964, 258, 4413–4416. [Google Scholar]
  8. Nagurney, A. Network Economics: A Variational Inequality Approach; Kluwer Academic: Dordrecht, The Netherlands, 1999. [Google Scholar]
  9. Shehu, Y.; Iyiola, O.S.; Reich, S. A modified inertial subgradient extragradient method for solving variational inequalities. Optim. Eng. 2022, 23, 421–449. [Google Scholar] [CrossRef]
  10. Liu, L.; Cho, S.Y.; Yao, J.C. Convergence analysis of an inertial Tseng’s extragradient algorithm for solving pseudomonotone variational inequalities and applications. J. Nonlinear Var. Anal. 2021, 2, 47–63. [Google Scholar]
  11. Nwawuru, F.O.; Echezona, G.N.; Okeke, C.C. Finding a common solution of variational inequality and fixed point problems using subgradient extragradient techniques. Rend. Circ. Mat. Palermo II Ser. 2024, 73, 1255–1275. [Google Scholar] [CrossRef]
  12. Dilshad, M.; Alamrani, F.M.; Alamer, A.; Alshaban, E.; Alshehri, M.G. Viscosity-type inertial iterative methods for variational inclusion and fixed point problems. AIMS Math. 2024, 9, 18553–18573. [Google Scholar] [CrossRef]
  13. Facchinei, F.; Pang, J.S. Finite-Dimensional Variational Inequalities and Complementarity Problems; Springer: New York, NY, USA, 2003. [Google Scholar]
  14. Jiang, H.; Xu, H. Stochastic approximation approaches to stochastic variational inequality problems. IEEE Trans. Autom. Control 2008, 53, 1462–1475. [Google Scholar] [CrossRef]
  15. Shapiro, A. Monte Carlo sampling methods. In Handbooks in Operations Research and Management Science: Stochastic Programming; Ruszczyński, A., Shapiro, A., Eds.; Elsevier: Amsterdam, The Netherlands, 2003; pp. 353–425. [Google Scholar]
  16. Wang, M.Z.; Lin, G.H.; Gao, Y.L.; Ali, M.M. Sample average approximation method for a class of stochastic variational inequality problems. J. Syst. Sci. Complex. 2011, 24, 1143–1153. [Google Scholar] [CrossRef]
  17. He, S.X.; Zhang, P.; Hu, X.; Hu, R. A sample average approximation method based on a D-gap function for stochastic variational inequality problems. J. Ind. Manag. Optim. 2014, 10, 977–987. [Google Scholar] [CrossRef]
  18. Cherukuri, A. Sample average approximation of conditional value-at-risk based variational inequalities. Optim. Lett. 2024, 18, 471–496. [Google Scholar] [CrossRef]
  19. Zhou, Z.; Honnappa, H.; Pasupathy, R. Drift optimization of regulated stochastic models using sample average approximation. arXiv 2025, arXiv:2506.06723. [Google Scholar] [CrossRef]
  20. Yousefian, F.; Nedić, A.; Shanbhag, U.V. Distributed adaptive steplength stochastic approximation schemes for Cartesian stochastic variational inequality problems. arXiv 2013, arXiv:1301.1711. [Google Scholar] [CrossRef]
  21. Koshal, J.; Nedić, A.; Shanbhag, U.V. Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 2013, 58, 594–609. [Google Scholar] [CrossRef]
  22. Iusem, A.N.; Jofré, A.; Thompson, P. Incremental constraint projection methods for monotone stochastic variational inequalities. Math. Oper. Res. 2019, 44, 236–263. [Google Scholar] [CrossRef]
  23. Yang, Z.P.; Zhang, J.; Wang, Y.; Lin, G.H. Variance-based subgradient extragradient method for stochastic variational inequality problems. J. Sci. Comput. 2021, 89, 4. [Google Scholar] [CrossRef]
  24. Korpelevich, G.M. The extragradient method for finding saddle points and other problems. Matekon 1976, 12, 747–756. [Google Scholar]
  25. Iusem, A.N.; Jofré, A.; Oliveira, R.I.; Thompson, P. Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 2017, 27, 686–724. [Google Scholar] [CrossRef]
  26. Nwawuru, F.O. Approximation of solutions of split monotone variational inclusion problems and fixed point problems. Pan-Am. J. Math. 2023, 2, 1. [Google Scholar] [CrossRef]
  27. Iusem, A.N.; Jofré, A.; Oliveira, R.I.; Thompson, P. Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 2019, 29, 175–206. [Google Scholar] [CrossRef]
  28. Censor, Y.; Gibali, A.; Reich, S. The subgradient extragradient method for solving variational inequalities in Hilbert space. J. Optim. Theory Appl. 2011, 148, 318–335. [Google Scholar] [CrossRef]
  29. Nwawuru, F.O.; Ezeora, J.N.; ur Rehman, H.; Yao, J.-C. Self-adaptive subgradient extragradient algorithm for solving equilibrium and fixed point problems in Hilbert spaces. Numer. Algorithms 2025. [Google Scholar] [CrossRef]
  30. Wang, S.; Tao, H.; Lin, R.; Cho, Y.J. A self-adaptive stochastic subgradient extragradient algorithm for the stochastic pseudomonotone variational inequality problem with application. Z. Angew. Math. Phys. 2022, 73, 164. [Google Scholar] [CrossRef]
  31. Liu, L.; Qin, X. An accelerated stochastic extragradient-like algorithm with new stepsize rules for stochastic variational inequalities. Comput. Math. Appl. 2024, 163, 117–135. [Google Scholar] [CrossRef]
  32. Polyak, B.T. Some methods of speeding up the convergence of iterative methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
  33. Nwawuru, F.O.; Ezeora, J.N. Inertial-based extragradient algorithm for approximating a common solution of split-equilibrium problems and fixed-point problems of nonexpansive semigroups. J. Inequalities Appl. 2023, 2023, 22. [Google Scholar] [CrossRef]
  34. Ezeora, J.N.; Enyi, C.D.; Nwawuru, F.O.; Ogbonna, R.C. An algorithm for split equilibrium and fixed-point problems using inertial extragradient techniques. Comput. Appl. Math. 2023, 42, 103. [Google Scholar] [CrossRef]
  35. Nwawuru, F.O.; Narian, O.; Dilshad, M.; Ezeora, J.N. Splitting method involving two-step inertial iterations for solving inclusion and fixed point problems with applications. Fixed Point Theory Algorithms Sci. Eng. 2025, 2025, 8. [Google Scholar] [CrossRef]
  36. Enyi, C.D.; Ezeora, J.N.; Ugwunnadi, G.C.; Nwawuru, F.O.; Mukiawa, S.E. Generalized split feasibility problem: Solution by iteration. Carpathian J. Math. 2024, 40, 655–679. [Google Scholar] [CrossRef]
  37. Nesterov, Y.E. A method for solving a convex programming problem with convergence rate O(1/k2). Dokl. Akad. Nauk SSSR 1983, 269, 543–547. [Google Scholar]
  38. Long, X.J.; He, Y.H. A fast stochastic approximation-based subgradient extragradient algorithm with variance reduction for solving stochastic variational inequality problems. J. Comput. Appl. Math. 2023, 420, 114786. [Google Scholar] [CrossRef]
  39. Zhang, X.; Du, X.; Yang, Z.; Lin, G. An infeasible stochastic approximation and projection algorithm for stochastic variational inequalities. J. Optim. Theory Appl. 2019, 183, 1053–1076. [Google Scholar] [CrossRef]
  40. Fang, C.; Chen, S. Some extragradient algorithms for variational inequalities. In Advances in Variational and Hemivariational Inequalities; Han, W., Migórski, S., Sofonea, M., Eds.; Advances in Mechanics and Mathematics; Springer: Cham, Switzerland, 2015; Volume 33, pp. 145–171. [Google Scholar]
  41. Ezeora, J.N.; Nwawuru, F.O. An inertial-based hybrid and shrinking projection methods for solving split common fixed point problems in real reflexive spaces. Int. J. Nonlinear Anal. Appl. 2023, 14, 2541–2556. [Google Scholar]
  42. Robbins, H.; Siegmund, D. A convergence theorem for nonnegative almost supermartingales and some applications. In Optimizing Methods in Statistics; Rustagi, J.S., Ed.; Academic Press: New York, NY, USA, 1971; pp. 233–257. [Google Scholar]
  43. Li, T.; Cai, X.; Song, Y.; Ma, Y. Improved variance reduction extragradient method with line search for stochastic variational inequalities. J. Glob. Optim. 2023, 87, 423–446. [Google Scholar] [CrossRef]
  44. Yang, Z.P.; Lin, G.H. Variance-based single-call proximal extragradient algorithms for stochastic mixed variational inequalities. J. Optim. Theory Appl. 2021, 190, 393–427. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.