Next Article in Journal
Road Event Detection and Classification Algorithm Using Vibration and Acceleration Data
Previous Article in Journal
Spatial Shape-Aware Network for Elongated Target Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Incremental Delayed Subgradient Method for Decentralized Nonsmooth Convex–Concave Minimax Optimization

Department of Mathematics, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(3), 126; https://doi.org/10.3390/a18030126
Submission received: 7 January 2025 / Revised: 10 February 2025 / Accepted: 20 February 2025 / Published: 24 February 2025
(This article belongs to the Section Parallel and Distributed Algorithms)

Abstract

:
In this paper, we propose an incremental-type subgradient scheme for solving a nonsmooth convex–concave minimax optimization problem in the setting of Euclidean spaces. We investigate convergence results by deriving an upper bound for the absolute value of the difference between the function value of the averaged iterates and the saddle value, provided that the step size is a constant. By assuming that the step-size sequence is diminishing, we prove the convergences of both the averaged sequence of function values and the sequence of function values of averaged iterates to the saddle value. Finally, we also show some numerical examples for illustrating the obtained theoretical result.

1. Introduction

In this work, we focus on developing an iterative method for solving a finite-sum nonsmooth convex–concave minimax optimization problem in a decentralized setting:
min u X max v Y i = 1 m F i ( u , v ) ,
where the component coupling function F i : R p × R r R for each i = 1 , , m is a convex–concave function and the constrained sets X R p and Y R r are compact convex sets. The convex–concave optimization problem (1) has been extensively studied, with numerous methods proposed in the literature. It arises in various areas of convex optimization. In addition, it has been widely applied in multiple fields, namely, generative adversarial networks (GANs) [1,2], adversarial robust learning [3], classification [4,5,6], and image processing [7,8].
When m = 1 , the gradient descent–ascent method, a well-known approach introduced by Arrow, Hurwicz, and Uzawa [9], is commonly used to solve the problem in the smooth case. Furthermore, for cases where the coupling function is not necessarily smooth, Nedić and Ozdaglar [10] proposed the subgradient method, which uses subgradient computation in the place of gradient computation. Since the coupling objective function is a convex–concave real-valued function, the existence of subgradients for both variables is guaranteed. A common difficulty with gradient or subgradient methods for convex optimization is that while subgradients exist, they are often hard to compute. To address this issue, the concept of delay techniques for calculating subgradients was introduced [11,12,13,14]. Specifically, the next iteration can be updated using information from stale iterates rather than computing the subgradient in the current iterate, which saves computational runtime. Recently, Arunrat and Nimana [15] proposed a method based on the subgradient method, together with the delayed technique to give more strategies in order to calculate the subgradient for solving problem (1) when m = 1 . In this approach, the iterates are updated simultaneously using delayed subgradients at stale iterates of their respective variables, combined with the current iterates of the other variable. The method is referred to as the delayed subgradient method and can be explicitly formulated as follows: For a current iterate ( u [ k ] , v [ k ] ) X × Y ( k 0 ) , compute
u [ k + 1 ] : = P X u [ k ] γ ˜ u F u [ k τ k ] , v [ k ] ,
and
v [ k + 1 ] : = P Y v [ k ] + γ ˜ v F u [ k ] , v [ k μ k ] ,
where ˜ u F ( u [ k τ k ] , v [ k ] ) is a subgradient of F with respect to u at a stale iterate u [ k τ k ] ; ˜ v F ( u [ k ] , v [ k μ k ] ) is a subgradient of F with respect to v at a stale iterate v [ k μ k ] ; and P X and P Y are metric projections on X and Y , respectively. In some situations, the objective functions can be expressed as a finite sum or within a stochastic setting. Focusing on each coordinate individually rather than on the entire function can often yield better results. Several methods were also developed to address this type of convex–concave optimization problem [16,17,18,19,20].
On the other hand, many authors are interested in studying convex minimization problems where the objective function is expressed as a finite sum. This problem is known as the additive convex optimization problem. Specifically, let m 1 ; for all i = 1 , , m , let F i : R n R be a convex function and let X R n be a nonempty closed and convex set. The additive convex optimization problem is defined as follows:
minimize i = 1 m F i ( u ) subject to u X .
In 2001, Nedić and Bertsekas proposed the so-called incremental subgradient methods [21,22] to address this problem. At each iteration, the solution is updated based on the subgradient of a single component of the objective function. The method is essentially structured as follows: u [ k ] X is the vector obtained after k cycles, and the vector u [ k + 1 ] X is updated by initializing with u 1 [ k ] : = u [ k ] , where we compute
u i + 1 [ k ] : = P X u i [ k ] γ k ˜ F i ( u i [ k ] ) ,
and generate u [ k + 1 ] X after one more cycle of m steps as
u [ k + 1 ] : = u m + 1 [ k ] ,
where ˜ F i ( u i [ k ] ) is a subgradient of F i at u i [ k ] , and γ k is a positive step size. These components can be selected either cyclically or randomly at each step. These methods are well suited for large-scale problems, as they reduce the memory and computational requirements compared with batch methods. Furthermore, this approach eliminates the need for a central user to collect all the data from each component. We notice that the delay technique has not yet been applied to the incremental subgradient method for solving problem (1).
In this work, we propose the incremental delayed subgradient method for solving the decentralized nonsmooth convex–concave minimax problem (1). This approach updates the next iterates incrementally by incorporating delayed subgradients from past iterates of their respective variables while using the current iterates of the other variable. We provide a detailed characterization of the fundamental relations required to analyze the behavior of the generated sequences. Furthermore, we establish convergence results for both constant and diminishing step-size sequences. Finally, we present numerical examples to demonstrate the effectiveness of the proposed method.
The following sections are structured as follows: In Section 2, we review essential concepts and important results related to convex–concave coupling functions and the min–max optimization problem. Section 3 introduces the incremental delayed subgradient method, outlines the assumptions, and provides relevant discussions. In addition, this section includes an analysis of the convergence properties of the proposed method. Section 4 presents numerical experiments on simple convex–concave functions to demonstrate the effectiveness of the proposed method. Finally, the concluding section summarizes our findings and provides closing remarks.

2. Preliminaries

In this section, we recall some definitions and useful facts used in this work. We denote the symbols R as the set of real numbers and N 0 as the set of nonnegative integers. Let u = ( x 1 , x 2 , , x p ) and v = ( y 1 , y 2 , , y p ) in R p ; the inner products of u and v , denoted by u , v , is defined by u , v : = i = 1 p x i y i . Moreover, the norm of u , denoted by u , is defined by u : = u , u . If X is a bounded set in R p , then we denote the diameter of X by D X : = sup { u v : u , v X } .
Let X R p and Y R r be nonempty sets and F : X × Y R be a function. We say that F is a convex–concave function if the function F ( · , v ) : X R is a convex function for each fixed v Y , that is,
F ( α x + ( 1 α ) y , v ) α F ( x , v ) + ( 1 α ) F ( y , v ) for all x , y X and α ( 0 , 1 ) ,
and F ( u , · ) : Y R is a concave function for each fixed u X , that is,
F ( u , α w + ( 1 α ) z ) α F ( u , w ) + ( 1 α ) F ( u , z ) for all w , z Y and α ( 0 , 1 ) .
A vector pair ( u * , v * ) X × Y is said to be a saddle point of F on X × Y (with respect to minimizing in u and maximizing in v ) if
F ( u * , v ) F ( u * , v * ) F ( u , v * ) , for all u X , v Y .
We call the value F * : = F ( u * , v * ) the saddle value. Note that if X and Y are nonempty closed bounded convex sets and F : X × Y R is a convex–concave function, then F has a saddle point on X × Y (see [23], Corollary 37.6.2).
Let f : R p R be a function and let u R p . A vector ˜ f ( u ) R p is called a subgradient of f at u if
f ( v ) ˜ f ( u ) , v u + f ( u ) for all v R p .
We denote the set of all subgradients of f at u by f ( u ) and refer to it as the subdifferential of f. The function f is called subdifferentiable at u R p if f ( u ) . Note that if f : R p R is a convex function and u R p , then f is subdifferentiable at u . If f : R p R is a concave function, a vector ˜ f ( u ) R p is called a supergradient of f at u if
f ( v ) ˜ f ( u ) , v u + f ( u ) for all v R p .
The set of all supergradients of f at u is denoted by ¯ f ( u ) and is called the superdifferential of f. For simplicity, we also refer to them as the subgradient and subdifferential sets. Furthermore, we note that ¯ f ( u ) = ( f ) ( u ) . Moreover, since f is a convex function, we also have that f is subdifferentiable at u .
Let F : R p × R r R be a convex–concave function. Following the above definition of a subgradient for a convex function, we extend this concept to define the subgradients of a convex–concave function as follows. And, for a fixed v R r and u ¯ R p , we call a vector ˜ u F ( u ¯ , v ) R p a subgradient of the convex function F ( · , v ) with respect to u at the point u ¯ if
˜ u F ( u ¯ , v ) , u u ¯ + F ( u ¯ , v ) F ( u , v ) , for all u R p .
And for a fixed u R p and let v ¯ R r , we call a vector ˜ v F ( u , v ¯ ) R r a subgradient of the concave function F ( u , · ) with respect to v at the point v ¯ if
˜ v F ( u , v ¯ ) , v v ¯ + F ( u , v ¯ ) F ( u , v ) , for all v R r .
Let X R p be a nonempty subset and u R p . If there is a point v X such that
u v u z ,
for all z X , then v is said to be a metric projection of u onto X and we usually symbolize it by writing P X ( u ) . Note that if X R p is a nonempty closed convex set, then for each u R p , there exists a unique metric projection P X ( u ) .

3. Incremental Delayed Subgradient Method

In this section, we present the incremental delayed subgradient method (in short, IDSM), which is the main algorithm of this work. After that, we state the technical lemmata that is used in the convergence results.
To begin with, we need the following delay bound assumption.
Assumption 1.
The delay sequences { τ i k } k = 0 , { μ i k } k = 0 N 0 are bounded, that is, there exists a non-negative integer τ max such that
0 τ i k , μ i k τ max , for all k 0 and i = 1 , , m .
We are in a position to present the incremental delayed subgradient method in Algorithm 1 as follows.
Algorithm 1 Incremental Delayed Subgradient Method (in short, IDSM)
Initialization: Given the step-size sequence { γ k } k = 0 ( 0 , ) of real numbers, choose initial points u [ 0 ] , u [ 1 ] , , u [ τ max ] X and v [ 0 ] , v [ 1 ] , , v [ τ max ] Y arbitrarily.
Iterative Step: For an iterate ( u [ k ] , v [ k ] ) X × Y ( k 0 ), starting with
u 1 [ k ] : = u [ k ] and v 1 [ k ] : = v [ k ] .
For each i = 1 , , m , compute
u i + 1 [ k ] : = P X u i [ k ] γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) ,
and
v i + 1 [ k ] : = P Y v i [ k ] + γ k ˜ v F i ( u [ k ] , v [ k μ i k ] ) ,
where ˜ u F i ( u [ k τ i k ] , v [ k ] ) is a subgradient of F i with respect to u at stale iterate u [ k τ i k ] , and ˜ v F i ( u [ k ] , v [ k μ i k ] ) is a subgradient of F i with respect to v at stale iterate v [ k μ i k ] .
Compute
u [ k + 1 ] : = u m + 1 [ k ] and v [ k + 1 ] : = v m + 1 [ k ] .
Update k : = k + 1 .
To make the analysis of the convergence results of the Algorithm 1 more straightforward, we assume throughout this work that
u [ 0 ] = u [ 1 ] = = u [ τ max ] and v [ 0 ] = v [ 1 ] = = v [ τ max ] .
Remark 1.
If the number m = 1 , the proposed IDSM reduces to the method proposed by Arunrat and Nimana [15]. Furthermore, if, additionally, the delays have vanished, that is τ max = 0 , the IDSM is the subgradient method proposed by Nedić and Ozdaglar [10].

3.1. Technical Lemmata

The boundedness assumption of the constraint sets X and Y , along with the fact that the subgradient of a real-valued convex function is uniformly bounded over any bounded subset, leads to the boundedness property as the following lemma.
Lemma 1.
There is a real constant B > 0 such that for all ( u , v ) X × Y and for all i = 1 , , m ,
max ˜ u F i ( u , v ) , ˜ v F i ( u , v ) B .
Next, we focus on characterizing the fundamental repeated relations of the iterative sequences u [ k ] and v [ k ] and points u X and v Y , respectively, which are utilized in the next steps of the convergence results. For the simplicity of the notation, we denote the following:
F ( u , v ) : = i = 1 m F i ( u , v ) for   all   ( u , v ) R p × R r .
Lemma 2.
Let { u [ k ] } k = 0 and { v [ k ] } k = 0 be sequences generated by the IDSM, and a ( 0 , 1 ) be a real number. The following statements are true:
(i) 
For all u X and for all integers k 0 ,
u [ k + 1 ] u 2 u [ k ] u 2 + γ k a 2 m 2 + γ k a 1 i = 1 m u i + 1 [ k ] u i [ k ] 2 + 8 B 2 γ k 2 a m 3 + 2 γ k ( F ( u , v [ k ] ) F ( u [ k ] , v [ k ] ) ) + γ k a m ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 .
(ii) 
For all v Y and for all integers k 0 ,
v [ k + 1 ] v 2 v [ k ] v 2 + γ k a 2 m 2 + γ k a 1 i = 1 m v i + 1 [ k ] v i [ k ] 2 + 8 B 2 γ k 2 a m 3 + 2 γ k ( F ( u [ k ] , v [ k ] ) F ( u [ k ] , v ) ) + γ k a m ( τ max + 1 ) j = 0 τ max v [ k j + 1 ] v [ k j ] 2 .
Proof. 
We present only the proof of (i). Since the proof of (ii) trivially follows the line of reasoning of the proof of (i) by replacing the convexity by the concavity of F i ( u , · ) for all u X , we omit it here.
Let u X and k 0 be given. For every i = 1 to m, by using the definition of u [ k + 1 ] and the property of metric projection, we note that
0 u i + 1 [ k ] u i [ k ] + γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u u i + 1 [ k ] = u i + 1 [ k ] u i [ k ] , u u i + 1 [ k ] + γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u u [ k τ i k ] + γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k τ i k ] u i + 1 [ k ] .
Let us arrange the terms on the right-hand side of the inequality above. Now, the first term can be arranged as
u i + 1 [ k ] u i [ k ] , u u i + 1 [ k ] = 1 2 u i [ k ] u 2 1 2 u i + 1 [ k ] u i [ k ] 2 1 2 u i + 1 [ k ] u 2 .
Since ˜ u F i ( u [ k τ i k ] , v [ k ] ) is the subgradient of a convex function F i ( · , v [ k ] ) at u [ k τ i k ] , we obtain that the second term can be bounded as
γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u u [ k τ i k ] γ k ( F i ( u , v [ k ] ) F i ( u [ k τ i k ] , v [ k ] ) ) .
Regarding the last term in the inequality above, we begin by noting that
˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k τ i k ] u i + 1 [ k ] = ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k τ i k ] u [ k + 1 ] + ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k + 1 ] u i + 1 [ k ] .
By using the facts that the constant a ( 0 , 1 ) and the number m 1 , we note that
0 m γ k 2 a 2 ˜ u F i ( u [ k τ i k ] , v [ k ] ) 1 2 m γ k a 2 ( u [ k τ i k ] u [ k + 1 ] ) 2 = m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k τ i k ] u [ k + 1 ] + γ k a 4 m 2 u [ k τ i k ] u [ k + 1 ] 2 ,
which implies that
γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k τ i k ] u [ k + 1 ] m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k τ i k ] u [ k + 1 ] 2 .
Using the same technique for deriving (6), we also have
γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k + 1 ] u i + 1 [ k ] m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k + 1 ] u i + 1 [ k ] 2 .
Thus, by applying the obtained inequalities (6) and (7), inequality (5) becomes
γ k ˜ u F i ( u [ k τ i k ] , v [ k ] ) , u [ k τ i k ] u i + 1 [ k ] 2 m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k τ i k ] u [ k + 1 ] 2 + γ k a 4 m 2 u [ k + 1 ] u i + 1 [ k ] 2 .
By applying inequalities (3), (4) and (8) in inequality (2), we obtain
0 1 2 u i [ k ] u 2 1 2 u i + 1 [ k ] u i [ k ] 2 1 2 u i + 1 [ k ] u 2 + γ k ( F i ( u , v [ k ] ) F i ( u [ k τ i k ] , v [ k ] ) ) + 2 m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k τ i k ] u [ k + 1 ] 2 + γ k a 4 m 2 u [ k + 1 ] u i + 1 [ k ] 2 = 1 2 u i [ k ] u 2 1 2 u i + 1 [ k ] u i [ k ] 2 1 2 u i + 1 [ k ] u 2 + γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + γ k ( F i ( u [ k ] , v [ k ] ) F i ( u [ k τ i k ] , v [ k ] ) ) + 2 m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k τ i k ] u [ k + 1 ] 2 + γ k a 4 m 2 u [ k + 1 ] u i + 1 [ k ] 2 .
Now, we apply the subgradient inequality for a convex function F i ( · , v [ k ] ) at u [ k ] and u [ k + 1 ] to obtain
γ k ( F i ( u [ k ] , v [ k ] ) F i ( u [ k τ i k ] , v [ k ] ) ) = γ k ( F i ( u [ k ] , v [ k ] ) F i ( u [ k + 1 ] , v [ k ] ) ) + γ k ( F i ( u [ k + 1 ] , v [ k ] ) F i ( u [ k τ i k ] , v [ k ] ) ) γ k ˜ u F i ( u [ k ] , v [ k ] ) , u [ k ] u [ k + 1 ] + γ k ˜ u F i ( u [ k + 1 ] , v [ k ] ) , u [ k + 1 ] u [ k τ i k ] .
Using the same technique to derive inequality (6), we can obtain the following inequalities:
γ k ˜ u F i ( u [ k ] , v [ k ] ) , u [ k ] u [ k + 1 ] m 2 γ k 2 a ˜ u F i ( u [ k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k ] u [ k + 1 ] 2 ,
and
γ k ˜ u F i ( u [ k + 1 ] , v [ k ] ) , u [ k + 1 ] u [ k τ i k ] m 2 γ k 2 a ˜ u F i ( u [ k + 1 ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k + 1 ] u [ k τ i k ] 2 .
From inequalities (11) and (12), along with (10), it follows that inequality (9) becomes
0 1 2 u i [ k ] u 2 1 2 u i + 1 [ k ] u i [ k ] 2 1 2 u i + 1 [ k ] u 2 + γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + m 2 γ k 2 a ˜ u F i ( u [ k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k ] u [ k + 1 ] 2 + m 2 γ k 2 a ˜ u F i ( u [ k + 1 ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k + 1 ] u [ k τ i k ] 2 + 2 m 2 γ k 2 a ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 + γ k a 4 m 2 u [ k τ i k ] u [ k + 1 ] 2 + γ k a 4 m 2 u [ k + 1 ] u i + 1 [ k ] 2 ,
and it is organized as
u i + 1 [ k ] u 2 u i [ k ] u 2 + γ k a 2 m 2 1 u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) + 2 γ k 2 a m 2 ( ˜ u F i ( u [ k ] , v [ k ] ) 2 + ˜ u F i ( u [ k + 1 ] , v [ k ] ) 2 + 2 ˜ u F i ( u [ k τ i k ] , v [ k ] ) 2 ) + γ k a 2 m 2 u [ k + 1 ] u [ k ] 2 + γ k a m 2 u [ k + 1 ] u [ k τ i k ] 2 + γ k a 2 m 2 u [ k + 1 ] u i + 1 [ k ] 2 u i [ k ] u 2 + γ k a 2 m 2 1 u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 2 + γ k a 2 m 2 u [ k + 1 ] u [ k ] 2 + γ k a m 2 u [ k + 1 ] u [ k τ i k ] 2 + γ k a 2 m 2 u [ k + 1 ] u i + 1 [ k ] 2 ,
where the constant B is the bound of the subgradients given in Lemma 1. We observe that the term u [ k + 1 ] u [ k τ i k ] 2 in inequality (13) above can be written as
u [ k + 1 ] u [ k τ i k ] 2 = ( τ i k + 1 ) 2 j = 0 τ i k u [ k j + 1 ] u [ k j ] τ i k + 1 2 ( τ i k + 1 ) j = 0 τ i k u [ k j + 1 ] u [ k j ] 2 ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 .
Thus, using this inequality in (13), we have
u i + 1 [ k ] u 2 u i [ k ] u 2 + γ k a 2 m 2 1 u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 2 + γ k a 2 m 2 u [ k + 1 ] u [ k ] 2 + γ k a 2 m 2 u [ k + 1 ] u i + 1 [ k ] 2 + γ k a m 2 ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 .
Now, we investigate the terms u [ k + 1 ] u [ k ] 2 and u [ k + 1 ] u i + 1 [ k ] 2 in inequality (14) as follows. By using the convexity of · 2 , we note that
u [ k + 1 ] u [ k ] 2 = i = 1 m ( u i + 1 [ k ] u i [ k ] ) 2 = m 2 i = 1 m u i + 1 [ k ] u i [ k ] m 2 m i = 1 m u i + 1 [ k ] u i [ k ] 2 .
With the same technique as above, the term u [ k + 1 ] u i + 1 [ k ] 2 is bounded as
u [ k + 1 ] u i + 1 [ k ] 2 = j = i + 1 m ( u j + 1 [ k ] u j [ k ] ) 2 = ( m i ) 2 j = i + 1 m ( u j + 1 [ k ] u j [ k ] ) m i 2 m j = 1 m u j + 1 [ k ] u j [ k ] 2 = m i = 1 m u i + 1 [ k ] u i [ k ] 2 .
Therefore, by applying inequalities (15) and (16) in inequality (14), we obtain that
u i + 1 [ k ] u 2 u i [ k ] u 2 + γ k a 2 m 2 1 u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 2 + γ k a 2 m i = 1 m u i + 1 [ k ] u i [ k ] 2 + γ k a 2 m i = 1 m u i + 1 [ k ] u i [ k ] 2 + γ k a m 2 ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 = u i [ k ] u 2 + γ k a 2 m 2 1 u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 2 + γ k a m 2 ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 + γ k a m i = 1 m u i + 1 [ k ] u i [ k ] 2 .
By adding inequality (17) over i = 1 , , m , we obtain
i = 1 m u i + 1 [ k ] u 2 i = 1 m u i [ k ] u 2 + γ k a 2 m 2 1 i = 1 m u i + 1 [ k ] u i [ k ] 2 + 2 γ k i = 1 m ( F i ( u , v [ k ] ) F i ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 3 + γ k a m i = 1 m i = 1 m u i + 1 [ k ] u i [ k ] 2 + γ k a m 2 ( τ max + 1 ) i = 1 m j = 0 τ max u [ k j + 1 ] u [ k j ] 2 ,
and then
u [ k + 1 ] u 2 u [ k ] u 2 + γ k a 2 m 2 1 i = 1 m u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F ( u , v [ k ] ) F ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 3 + γ k a m 2 ( τ max + 1 ) j = 0 τ max i = 1 m u [ k j + 1 ] u [ k j ] 2 + γ k a i = 1 m u i + 1 [ k ] u i [ k ] 2 ,
implying that
u [ k + 1 ] u 2 u [ k ] u 2 + γ k a 2 m 2 + γ k a 1 i = 1 m u i + 1 [ k ] u i [ k ] 2 + 2 γ k ( F ( u , v [ k ] ) F ( u [ k ] , v [ k ] ) ) + 8 B 2 γ k 2 a m 3 + γ k a m ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 ,
as desired. □

3.2. Bounds for Constant Step Size

Before proposing the lemma below, we define the notations in the case of a constant step size, where we write the averaged iterates u ^ [ N ] and v ^ [ N ] as
u ^ [ N ] : = 1 ( N + 1 ) k = 0 N u [ k ] and v ^ [ N ] : = 1 ( N + 1 ) k = 0 N v [ k ] , for all N 0 .
The proof of the following lemma can be carried out in the same manner as the proof of Lemma 4.2 in [15] by using Lemma 2 with γ k = γ .
Lemma 3.
Let { u [ k ] } k = 0 and { v [ k ] } k = 0 be sequences generated by the IDSM, and a ( 0 , 1 ) be a real number. Suppose that the step-size constant γ 0 , 1 1 2 m 2 + 1 + ( τ max + 1 ) 2 1 / a . The following statements are true:
(i) 
For all u X and for all integers N 0 ,
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u , v ^ [ N ] ) u [ 0 ] u 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a .
(ii) 
For all v Y and for all integers N 0 ,
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u ^ [ N ] , v ) v [ 0 ] v 2 2 γ ( N + 1 ) 4 B 2 m 3 γ 1 a .
Next, we determine the upper bound for the minimax gap of the averaged iterates u ^ [ N ] and v ^ [ N ] by using the obtained results in Lemma 3 as follows:
Theorem 2.
Let { u [ k ] } k = 0 and { v [ k ] } k = 0 be sequences generated by the IDSM, and a ( 0 , 1 ) be a real number. Suppose that the step-size constant γ 0 , 1 1 2 m 2 + 1 + ( τ max + 1 ) 2 1 / a . Then, for all N 0 and any saddle point ( u * , v * ) X × Y , we have
0 F ( u ^ [ N ] , v * ) F ( u * , v ^ [ N ] ) u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) + 8 B 2 m 3 γ 1 a .
Proof. 
Let u = u * and v = v * in inequalities (18) and (19), respectively. For each N 0 , we have
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v ^ [ N ] ) u [ 0 ] u * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a ,
and
F ( u ^ [ N ] , v * ) 1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) v [ 0 ] v * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a .
Therefore, by combining these two inequalities, we can conclude that
0 F ( u ^ [ N ] , v * ) F ( u * , v ^ [ N ] ) u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) + 8 B 2 m 3 γ 1 a ,
The proof is completed. □
Furthermore, we can derive the upper bound for the difference in the function values of the averaged iterates F ( u ^ [ N ] , v ^ [ N ] ) and the saddle value F * as the following theorem.
Theorem 3.
Let { u [ k ] } k = 0 and { v [ k ] } k = 0 be sequences generated by the IDSM, and a ( 0 , 1 ) be a real number. Suppose that the step-size constant γ 0 , 1 1 2 m 2 + 1 + ( τ max + 1 ) 2 1 / a . Then, for all N 0 and any saddle point ( u * , v * ) X × Y , we have
u [ 0 ] u ^ [ N ] 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) 8 B 2 m 3 γ 1 a F ( u ^ [ N ] , v ^ [ N ] ) F ( u * , v * ) u [ 0 ] u * 2 + v [ 0 ] v ^ [ N ] 2 2 γ ( N + 1 ) + 8 B 2 m 3 γ 1 a .
Proof. 
For any N 0 , we set u = u * and v = v * in (18) and (19), respectively, where we have that
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v ^ [ N ] ) u [ 0 ] u * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a ,
and
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u ^ [ N ] , v * ) v [ 0 ] v * 2 2 γ ( N + 1 ) 4 B 2 m 3 γ 1 a .
Since u ^ [ N ] X and v ^ [ N ] Y for all N 0 , the saddle-point relation implies that
F ( u * , v ^ [ N ] ) F ( u * , v * ) ,
and
F ( u * , v * ) F ( u ^ [ N ] , v * ) .
We invoke these two results with inequalities (21) and (22), respectively, which are
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v * ) u [ 0 ] u * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a ,
and
F ( u * , v * ) 1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) v [ 0 ] v * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a .
Now, by combining inequalities (23) and (24), we obtain
v [ 0 ] v * 2 2 γ ( N + 1 ) 4 B 2 m 3 γ 1 a 1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v * ) u [ 0 ] u * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a .
In addition, by using u = u ^ N and v = v ^ N in (18) and (19), respectively, we derive that
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u ^ [ N ] , v ^ [ N ] ) u [ 0 ] u ^ [ N ] 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a ,
and
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u ^ [ N ] , v ^ [ N ] ) m v [ 0 ] v ^ [ N ] 2 2 γ ( N + 1 ) 4 B 2 m 3 γ 1 a .
Next, we combine the two obtained relations and multiply by 1 on both sides of the inequality to obtain that
u [ 0 ] u ^ [ N ] 2 2 γ ( N + 1 ) 4 B 2 m 3 γ 1 a F ( u ^ [ N ] , v ^ [ N ] ) 1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) v [ 0 ] v ^ [ N ] 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a .
Finally, we sum up inequalities (25) and (26) to obtain
u [ 0 ] u ^ [ N ] 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) 8 B 2 m 3 γ 1 a F ( u ^ [ N ] , v ^ [ N ] ) F ( u * , v * ) u [ 0 ] u * 2 + v [ 0 ] v ^ [ N ] 2 2 γ ( N + 1 ) + 8 B 2 m 3 γ 1 a ,
which completes the proof. □
By noticing the proving lines of Theorem 3, we can derive the complexity upper bound of the IDSM as the following remark.
Remark 2.
Let ( u * , v * ) X × Y be a saddle point of problem (1) and N 0 . According to (25), we have
u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) 4 B 2 m 3 γ 1 a 1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v * ) u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a ,
which yields the upper bound on the average of function values 1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) for the saddle value F * as
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v * ) u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a .
Notice that by choosing a small step size γ so that the term γ 1 a is small allows for a small error level, as required. To obtain an ε-optimal solution to problem (1), we now want to know how many iterations N are required. Now, let ε > 0 be so that
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v * ) ε .
Note that finding the nonnegative integer N for which
u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a ε ,
is achieved in the same manner as
N u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( ε 4 B 2 m 3 γ 1 a ) 1 .
Let us fix a length { 0 , 1 , , N } . By putting the positive real number a > 0.5 and committing over this length, we may choose the constant step size γ = ε ( N + 1 ) 1 / 2 ( 1 a ) for some constant ε 0 , 1 1 2 m 2 + 1 + ( τ max + 1 ) 2 1 / a so that
u [ 0 ] u * 2 + v [ 0 ] v * 2 2 γ ( N + 1 ) + 4 B 2 m 3 γ 1 a = u [ 0 ] u * 2 + v [ 0 ] v * 2 2 ε ( N + 1 ) 1 ( 1 / 2 ( 1 a ) ) + 4 B 2 m 3 ε 1 a ( N + 1 ) 1 / 2 u [ 0 ] u * 2 + v [ 0 ] v * 2 2 ε + 4 B 2 m 3 ε 1 a 1 ( N + 1 ) 1 ( 1 / 2 ( 1 a ) ) .
Since we know that 1 2 + ( τ max + 1 ) 2 1 and a < 1 a , we have 2 + ( τ max + 1 ) 2 ( 1 a ) / a 1 . Now, by particularly putting ε : = 1 ( 2 + ( τ max + 1 ) 2 ) 1 / a = 1 ( τ max 2 + 2 τ max + 3 ) 1 / a , we obtain the above constant bound:
u [ 0 ] u * 2 + v [ 0 ] v * 2 2 ε + 4 B 2 m 3 ε 1 a = ( u [ 0 ] u * 2 + v [ 0 ] v * 2 ) ( τ max 2 + 2 τ max + 3 ) 1 / a 2 + 4 B 2 m 3 ( τ max 2 + 2 τ max + 3 ) ( 1 a ) / a ( u [ 0 ] u * 2 + v [ 0 ] v * 2 ) ( τ max 2 + 2 τ max + 3 ) 1 / a 2 + 4 B 2 m 3 .
Therefore, we obtain the complexity upper bound within a fixed length { 0 , 1 , , N } and a constant a ( 0 , 0.5 ) as
1 ( N + 1 ) k = 0 N F ( u [ k ] , v [ k ] ) F ( u * , v * ) ( u [ 0 ] u * 2 + v [ 0 ] v * 2 ) ( τ max 2 + 2 τ max + 3 ) 1 / a 2 + 4 B 2 m 3 1 ( N + 1 ) 1 ( 1 / 2 ( 1 a ) ) .

3.3. Convergences of Function Values

For notation simplicity, we write the averaged iterates u ¯ [ N ] and v ¯ [ N ] as
u ¯ [ N ] : = k = 0 N γ k u [ k ] k = 0 N γ k and v ¯ [ N ] : = k = 0 N γ k v [ k ] k = 0 N γ k , for all N 0 .
Lemma 4.
Let { u [ k ] } k = 0 and { v [ k ] } k = 0 be sequences generated by the IDSM, and a ( 0 , 1 ) be a real number. Suppose that the step-size sequence { γ k } k = 0 0 , 1 1 2 m 2 + 1 + ( τ max + 1 ) 2 1 / a is non-increasing.
(i) 
For all x X and for all integers N 0 ,
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u , v ¯ [ N ] ) D X 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
(ii) 
For all y Y and for all integers N 0 ,
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u ¯ [ N ] , v ) D Y 2 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
Proof. 
(i) Let x X be given. Taking into account Lemma 2 (i), we note that for all k 0 ,
2 γ k ( F ( u [ k ] , v [ k ] ) F ( u , v [ k ] ) ) u [ k ] u 2 u [ k + 1 ] u 2 + γ k a 2 m 2 + γ k a 1 i = 1 m u i + 1 [ k ] u i [ k ] 2 + γ k a m ( τ max + 1 ) j = 0 τ max u [ k j + 1 ] u [ k j ] 2 + 8 B 2 γ k 2 a m 3 .
For a fixed integer N 0 , by summing this relation over k from 0 to N, we obtain
2 k = 0 N γ k ( F ( u [ k ] , v [ k ] ) F ( u , v [ k ] ) ) u [ 0 ] u 2 u [ N + 1 ] u 2 + 8 B 2 m 3 k = 0 N γ k 2 a + k = 0 N γ k a 2 m 2 + γ k a 1 i = 1 m u i + 1 [ k ] u i + 1 [ k ] 2 + ( τ max + 1 ) m k = 0 N γ k a j = 0 τ max u [ k j + 1 ] u [ k j ] 2 .
For the last term of the above inequality, the non-increasing property of the step-size sequence { γ k } k = 0 implies that
k = 0 N j = 0 τ max γ k a u [ k j + 1 ] u [ k j ] 2 = j = 0 τ max k = j N j γ k + j a u [ k + 1 ] u [ k ] 2 = j = 0 τ max k = 0 N j γ k + j a u [ k + 1 ] u [ k ] 2 j = 0 τ max k = 0 N j γ k a u [ k + 1 ] u [ k ] 2 j = 0 τ max k = 0 N γ k a u [ k + 1 ] u [ k ] 2 = ( τ max + 1 ) k = 0 N γ k a u [ k + 1 ] u [ k ] 2 m ( τ max + 1 ) k = 0 N γ k a i = 1 m u i + 1 [ k ] u i [ k ] 2 .
By applying this obtained inequality in inequality (29), we have
2 k = 0 N γ k ( F ( u [ k ] , v [ k ] ) F ( u , v [ k ] ) ) u [ 0 ] u 2 u [ N + 1 ] u 2 + 8 B 2 m 3 k = 0 N γ k 2 a + k = 0 N γ k a 2 m 2 + γ k a 1 i = 1 m u i + 1 [ k ] u i [ k ] 2 + ( τ max + 1 ) 2 k = 0 N γ k a i = 1 m u i + 1 [ k ] u i [ k ] 2 u [ 0 ] u 2 + 8 B 2 m 3 k = 0 N γ k 2 a + k = 0 N γ k a 2 m 2 + γ k a 1 + γ k a ( τ max + 1 ) 2 i = 1 m u i + 1 [ k ] u i [ k ] 2 .
Now, the condition of the step-size sequence { γ k } k = 0 implies that γ k a 2 m 2 + γ k a 1 + γ k a ( τ max + 1 ) 2 < 0 for all k 0 , and so
2 k = 0 N γ k ( F ( u [ k ] , v [ k ] ) F ( u , v [ k ] ) ) u [ 0 ] u 2 + 8 B 2 m 3 k = 0 N γ k 2 a ,
implying that
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k k = 0 N γ k F ( u , v [ k ] ) k = 0 N γ k u [ 0 ] u 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
As a result of the concavity of the function F ( u , · ) , it follows that
F ( u , v ¯ [ N ] ) k = 0 N γ k F ( u , v [ k ] ) k = 0 N γ k ,
which means that the above inequality becomes
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u , v ¯ [ N ] ) u [ 0 ] u 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k D X 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
(ii) Let y Y be given. By utilizing Lemma 2 (ii), we have that for all k 0 ,
2 γ k ( F ( u [ k ] , v [ k ] ) F ( u [ k ] , v ) ) v [ k + 1 ] v 2 v [ k ] v 2 γ k a 2 m 2 + γ k a 1 i = 1 m v i + 1 [ k ] v i [ k ] 2 γ k a m ( τ max + 1 ) j = 0 τ max v [ k j + 1 ] v [ k j ] 2 8 B 2 γ k 2 a m 3 .
We sum up the above inequality from k = 0 to a fixed number N 0 and we obtain that
2 k = 0 N γ k ( F ( u [ k ] , v [ k ] ) F ( u [ k ] , v ) ) v [ N + 1 ] v 2 v [ 0 ] v 2 8 B 2 m 3 k = 0 N γ k 2 a k = 0 N γ k a 2 m 2 + γ k a 1 i = 1 m v i + 1 [ k ] v i [ k ] 2 ( τ max + 1 ) m k = 0 N γ k a j = 0 τ max v [ k j + 1 ] v [ k j ] 2 .
In a similar technique to inequality (30), we also obtain
k = 0 N γ k a j = 0 τ max v [ k j + 1 ] v [ k j ] 2 m ( τ max + 1 ) k = 0 N γ k a i = 1 m v i + 1 [ k ] v i [ k ] 2 .
Invoking this relation in inequality (31), we have
2 k = 0 N γ k ( F ( u [ k ] , v [ k ] ) F ( u [ k ] , v ) ) v [ N + 1 ] v 2 v [ 0 ] v 2 8 B 2 m 3 k = 0 N γ k 2 a k = 0 N γ k a 2 m 2 + γ k a 1 i = 1 m v i + 1 [ k ] v i [ k ] 2 ( τ max + 1 ) 2 k = 0 N γ k a i = 1 m v i + 1 [ k ] v i [ k ] 2 v [ 0 ] v 2 8 B 2 m 3 k = 0 N γ k 2 a k = 0 N γ k a 2 m 2 + γ k a 1 + γ k a ( τ max + 1 ) 2 i = 1 m v i + 1 [ k ] v i [ k ] 2 .
Again, by applying the condition of the step-size sequence { γ k } k = 0 , we obtain
2 k = 0 N γ k ( F ( u [ k ] , v [ k ] ) F ( u [ k ] , v ) ) v [ 0 ] v 2 8 B 2 m 3 k = 0 N γ k 2 a ,
and then
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k k = 0 N γ k F ( u [ k ] , v ) k = 0 N γ k v [ 0 ] v 2 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
Furthermore, since we know that F ( · , v ) is a convex function for a fixed v Y , we also have
F ( u ¯ [ N ] , v ) k = 0 N γ k F ( u [ k ] , v ) k = 0 N γ k .
Finally, we can conclude that
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u ¯ [ N ] , v ) v [ 0 ] v 2 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k D Y 2 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k ,
as desired. □
Theorem 4.
Let { u [ k ] } k = 0 and { v [ k ] } k = 0 be sequences generated by the IDSM, and a ( 0 , 1 ) be a real number. Suppose that the step-size sequence { γ k } k = 0 0 , 1 1 2 m 2 + 1 + ( τ max + 1 ) 2 1 / a is non-increasing and satisfies lim k γ k = 0 and k = 0 γ k = . Then, the averaged sequence of function values k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k k = 0 and the sequence of function values of averaged iterates F ( u ¯ [ N ] , v ¯ [ N ] ) N = 0 both converge to F * .
Proof. 
Let u = u * and v = v * in (27) and (28), respectively. We obtain that for all N 0 ,
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u * , v ¯ [ N ] ) D X 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
and
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u ¯ [ N ] , v * ) D Y 2 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
For all N 0 , since the sets X and Y are convex, we have that u ¯ [ N ] X and v ¯ [ N ] Y . This, together with the saddle-point relation of ( u * , v * ) X × Y , implies that
F ( u * , v ¯ [ N ] ) F ( u * , v * ) F ( u ¯ [ N ] , v * ) .
Now, applying inequalities (32) and (33) with the saddle-point relation (34), we obtain that for all N 0 ,
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u * , v * ) k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u * , v ¯ [ N ] ) D X 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k ,
and
k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u * , v * ) k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u ¯ [ N ] , v * ) D Y 2 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
Therefore, we have that for all k 0 ,
D Y 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u * , v * ) D X 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k .
Next, by the condition that k = 0 γ k = , we obtain
lim N D X 2 2 k = 0 N γ k = lim N D Y 2 2 k = 0 N γ k = 0 .
Since lim k γ k 1 a = 0 , we have
lim N 8 B 2 m 3 k = 0 N γ k ( γ k 1 a ) 2 k = 0 N γ k = 0 .
By using these two results with inequality (35), we arrive at the conclusion that
lim N k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k = F * .
Again, by setting u = u ¯ [ N ] and v = v ¯ [ N ] in (27) and (28), respectively, we have that for all N 0 ,
D Y 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k k = 0 N γ k F ( u [ k ] , v [ k ] ) k = 0 N γ k F ( u ¯ [ N ] , v ¯ [ N ] ) D X 2 + 8 B 2 m 3 k = 0 N γ k 2 a 2 k = 0 N γ k ,
Combining this with inequalities (36)–(38), it becomes
lim N F ( u ¯ [ N ] , v ¯ [ N ] ) = F * ,
as desired. □

4. Numerical Examples

In this section, we consider the numerical examples of the proposed method (IDSM) for solving the distributed matrix game problem of the following form:
min u Δ p max v Δ r i = 1 m F i ( u , v ) ,
where the constrained sets Δ p R p and Δ r R r are standard simplices in R p and R r , respectively, and the component coupling function F i : R p × R r R for each i = 1 , , m is given by
F i ( u , v ) = u , A i v + 1 2 u 2 1 2 v 1 m 1 r 2 ,
where 1 r is the vector in R r , all of whose components are 1. We generated p × r matrices A i for each i = 1 , , m randomly, where its entries were independently and uniformly distributed in the interval ( 10 , 10 ) . We compared the IDSM with the existing methods NO-09 and AN-23. We set the delay sequences with the cyclic order delays τ i k = μ i k = k mod ( τ max + 1 ) for all k 0 and i = 1 , , m , with the delay bounds τ max = 0 , 5 , and 10 for both the IDSM and AN-23. We set the step-size constant as 1 2 + ( τ + 1 ) 2 1 / 0.99 for the IDSM and 2 1 + 2 ( τ + 1 ) 2 1 / 0.99 for AN-23 and NO-09. We performed 100 independent random tests and terminated all tested methods when the number of iterations reached 100. We examined the considered problem in various problem sizes p × r , as detailed below.

4.1. The Case p = 10

In Figure 1a–c, we consider the behavior of the average of 100 independently sampled function values with a fixed p = 10 . We observed that for all values r, all methods tended to converge to the saddle values.
According to Figure 2, which shows the behaviors of the relative errors of the function value for each problem size, we noticed that the IDSM with an appropriate delay bound selection, particularly for a fixed delay bound of τ = 10 , provided the most accurate results with the lowest relative error across all sizes. Additionally, for fixed delay bounds of τ = 5 and τ = 0 , it still exhibited lower relative error values when compared with the AN-23 and NO-09 methods.

4.2. The Case p = 100

It can be observed from Figure 3 that for all values r, all methods tended to converge to the saddle values.
In Figure 4, we observed that when r = 10 , the relative error with a fixed delay bound of τ = 10 for the IDSM exhibited the lowest relative error values. In contrast, for r = 100 and r = 1000 , the IDSM with a fixed delay bound of τ = 5 provided the most accurate results, showing the lowest relative error. Moreover, we found that the IDSM, in the case of fixed delay bounds, gave lower relative errors than both the AN-23 and NO-09 methods under the same fixed delay bounds.

5. Conclusions

In this work, we introduce the so-called incremental delayed subgradient method (in short, IDSM) to solve the saddle-point problem. We characterize the relations associated with the IDSM for use in the next crucial theorem. Moreover, we prove the convergence results, which are divided into two parts. In the part concerning the constant step size, we provide an upper bound for the absolute value of the difference between the function value of the averaged iterates and the saddle value. We also derive the complexity upper bound within a fixed time length. For the case of the diminishing step-size sequence, we prove the convergence results by demonstrating the convergences of both the averaged sequence of function values and the sequence of function values of averaged iterates to the saddle value. We finally showed numerical examples for matrix game problems. Even if the IDSM with delay effects yielded a better convergence behavior than those of NO-09 and AN-23, we noticed that the selection of appropriate delay bounds for some particular problems will be an interesting research direction.

Author Contributions

Conceptualization, T.F., T.A. and N.N.; methodology, T.F., T.A. and N.N.; software, T.A. and N.N.; validation, T.F., T.A. and N.N.; formal analysis, T.F., T.A. and N.N.; investigation, T.F., T.A. and N.N.; writing—original draft preparation, T.F.; writing—review and editing, T.F., T.A. and N.N.; visualization, N.N.; supervision, N.N.; project administration, N.N.; funding acquisition, N.N. All authors read and agreed to the published version of this manuscript.

Funding

This research was supported by the Fundamental Fund of Khon Kaen University. This research received funding support from the National Science, Research and Innovation Fund (NSRF).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available upon request from the authors.

Acknowledgments

The authors are thankful to the editor and two anonymous referees for comments and remarks that improved the quality and presentation of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Goodfellow, I.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
  2. Boţ, R.I.; Sedlmayer, M.; Vuong, P.T. A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to GANs. J. Mach. Learn. Res. 2023, 24, 1–37. [Google Scholar]
  3. Boţ, R.I.; Böhm, A. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. SIAM J. Optim. 2023, 33, 1884–1913. [Google Scholar] [CrossRef]
  4. Boţ, R.I.; Csetnek, E.R.; Sedlmayer, M. An accelerated minimax algorithm for convex-concave saddle point problems with nonsmooth coupling function. Comput. Optim. Appl. 2023, 86, 925–966. [Google Scholar] [CrossRef] [PubMed]
  5. Fan, Y.; Lyu, S.; Ying, Y.; Hu, B. Learning with average top-k loss. Adv. Neural Inf. Process. Syst. 2017, 497–505. [Google Scholar]
  6. Ying, Y.; Wen, L.; Lyu, S. Stochastic online AUC maximization. Adv. Neural Inf. Process. Syst. 2016, 451–459. [Google Scholar]
  7. He, B.S.; Yuan, X.M. Convergence analysis of primal–dual algorithms for a saddle point problem: From contraction perspective. SIAM J. Imaging Sci. 2012, 5, 119–149. [Google Scholar] [CrossRef]
  8. Cai, X.; Han, D.; Xu, L. An improved first-order primal-dual algorithm with a new correction step. J. Glob. Optim. 2013, 57, 1419–1428. [Google Scholar] [CrossRef]
  9. Arrow, K.J.; Hurwicz, L.; Uzawa, H. Studies in Linear and Non-Linear Programming; Stanford University Press: Stanford, CA, USA, 1958. [Google Scholar]
  10. Nedić, A.; Ozdaglar, A. Subgradient methods for saddle-point problems. J. Optim. Theory Appl. 2009, 142, 205–228. [Google Scholar] [CrossRef]
  11. Aytekin, A. Asynchronous First-Order Algorithms for Large-Scale Optimization: Analysis and Implementation. Ph.D. Dissertation, KTH Royal Institute of Technology, Stockholm, Sweden, 2019. [Google Scholar]
  12. Feyzmahdavian, H.R.; Aytekin, A.; Johansson, M. An asynchronous mini-batch algorithm for regularized stochastic optimization. IEEE Trans. Autom. Control 2016, 61, 3740–3754. [Google Scholar] [CrossRef]
  13. Gurbuzbalaban, M.; Ozdaglar, A.; Parrilo, P.A. On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 2017, 27, 1035–1048. [Google Scholar] [CrossRef]
  14. Vanli, N.D.; Gurbuzbalaban, M.; Ozdaglar, A. Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 2018, 28, 1282–1300. [Google Scholar] [CrossRef]
  15. Arunrat, T.; Nimana, N. A delayed subgradient method for nonsmooth convex-concave min-max optimization problems. Results Control Optim. 2023, 12, 100266. [Google Scholar] [CrossRef]
  16. Beznosikov, A.; Scutari, G.; Rogozin, A.; Gasnikov, A. Distributed saddle-point problems under data similarity. Adv. Neural Inf. Process. Syst. 2021, 34, 8172–8184. [Google Scholar]
  17. Dai, Y.H.; Wang, J.; Zhang, L. Stochastic approximation proximal subgradient method for stochastic convex-concave minimax optimization. arXiv 2024, arXiv:2403.20205. [Google Scholar]
  18. Rafique, H.; Liu, M.; Lin, Q.; Yang, T. Weakly-convex–concave min–max optimization: Provable algorithms and applications in machine learning. Optim. Methods Softw. 2021, 37, 1087–1121. [Google Scholar] [CrossRef]
  19. Jiang, F.; Zhang, Z.; He, H. Solving saddle point problems: A landscape of primal-dual algorithm with larger stepsizes. J. Glob. Optim. 2023, 85, 821–846. [Google Scholar] [CrossRef]
  20. Luo, L.; Xie, G.; Zhang, T.; Zhang, Z. Near optimal stochastic algorithms for finite-sum unbalanced convex-concave minimax optimization. arXiv 2021, arXiv:2106.01761. [Google Scholar]
  21. Nedić, A.; Bertsekas, D.P. Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 2001, 12, 109–138. [Google Scholar] [CrossRef]
  22. Nedić, A.; Bertsekas, D.P.; Borkar, V.S. Distributed asynchronous incremental subgradient methods. Stud. Comput. Math. 2001, 8, 381–407. [Google Scholar]
  23. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Figure 1. Behaviors of the average of 100 independent values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 10 and r = 10 , 100 , and 1000.
Figure 1. Behaviors of the average of 100 independent values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 10 and r = 10 , 100 , and 1000.
Algorithms 18 00126 g001
Figure 2. Behaviors of the average of 100 independent relative errors of values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 10 and r = 10 , 100 , and 1000.
Figure 2. Behaviors of the average of 100 independent relative errors of values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 10 and r = 10 , 100 , and 1000.
Algorithms 18 00126 g002
Figure 3. Behaviors of the average of 100 independent values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 100 and r = 10 , 100 , and 1000.
Figure 3. Behaviors of the average of 100 independent values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 100 and r = 10 , 100 , and 1000.
Algorithms 18 00126 g003
Figure 4. Behaviors of the average of 100 independent relative errors of values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 100 and r = 10 , 100 , and 1000.
Figure 4. Behaviors of the average of 100 independent relative errors of values F ( u ^ [ N ] , v ^ [ N ] ) performed by NO-09, AN-23, and the IDSM for the cases p = 100 and r = 10 , 100 , and 1000.
Algorithms 18 00126 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feesantia, T.; Arunrat, T.; Nimana, N. Incremental Delayed Subgradient Method for Decentralized Nonsmooth Convex–Concave Minimax Optimization. Algorithms 2025, 18, 126. https://doi.org/10.3390/a18030126

AMA Style

Feesantia T, Arunrat T, Nimana N. Incremental Delayed Subgradient Method for Decentralized Nonsmooth Convex–Concave Minimax Optimization. Algorithms. 2025; 18(3):126. https://doi.org/10.3390/a18030126

Chicago/Turabian Style

Feesantia, Thipagon, Tipsuda Arunrat, and Nimit Nimana. 2025. "Incremental Delayed Subgradient Method for Decentralized Nonsmooth Convex–Concave Minimax Optimization" Algorithms 18, no. 3: 126. https://doi.org/10.3390/a18030126

APA Style

Feesantia, T., Arunrat, T., & Nimana, N. (2025). Incremental Delayed Subgradient Method for Decentralized Nonsmooth Convex–Concave Minimax Optimization. Algorithms, 18(3), 126. https://doi.org/10.3390/a18030126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop