Next Article in Journal
The Mediating Role of Job Satisfaction in the Relationship between FWAs and Turnover Intentions
Previous Article in Journal
Nitrate Removal by Zero-Valent Metals: A Comprehensive Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(8), 4501; https://doi.org/10.3390/su14084501
Submission received: 22 March 2022 / Revised: 2 April 2022 / Accepted: 6 April 2022 / Published: 10 April 2022

Abstract

:
Fortet-Mourier (FM) probability metrics are important probability metrics, which have been widely adopted in the quantitative stability analysis of stochastic programming problems. In this study, we contribute to different types of convergence assertions between a probability distribution and its empirical distribution when the deviation is measured by FM metrics and consider their applications in stochastic optimization. We first establish the quantitative relation between FM metrics and Wasserstein metrics. After that, we derive the non-asymptotic moment estimate, asymptotic convergence, and non-asymptotic concentration estimate for FM metrics, which supplement the existing results. Finally, we apply the derived results to four kinds of stochastic optimization problems, which either extend the present results to more general cases or provide alternative avenues. All these discussions demonstrate the motivation as well as the significance of our study.

1. Introduction

The estimation of the distance between a distribution and its empirical approximation obtained from some independent and identically distributed (iid) samples is an important subject in probability theory, mathematical statistics, and information theory. It has vast applications in many fields, such as quantization, optimal matching, density estimation, clustering, and so on (see [1] and the references therein for more details). To quantify the distance between two probability distributions, some rules have been adopted to generate probability metrics, such as the commonly used ζ -structure metric. By selecting different generators of the ζ -structure metric, we obtain a number of well-known probability metrics, such as the Wasserstein metric (in this study, when we refer to Wasserstein metric, it means the 1-Wasserstein metric, which is also called the Kantorovich–Rubinstein metric or Kantorovich metric), FM metric, and total variation metric.
Among probability metrics with ζ -structure, the Wasserstein metric is the most popular one which has been widely applied in statistics, probability, and machine learning [2]. It originates from the optimal transportation problem and thus can be interpreted as an optimal mass transportation plan. Except for its practical meaning in transportation, the Wasserstein metric has some good properties. For example, convergence in the Wasserstein metric is equivalent to weak convergence plus the convergence of the first order absolute moment [2].
There is some literature concentrated on the convergence analysis under Wasserstein metrics between a distribution and its empirical approximation. From now on, we refer to this as the data-driven Wasserstein metric for simplicity, and other probability metrics do the same. These convergence analyses can be mainly divided into two parts: moment estimates which aim at providing the rate of convergence for the expectation of the Wasserstein distance between a distribution and its empirical approximation and concentration estimates which focus on the violation probability under a given tolerance. As for moment estimates, some earlier results can be found in [3,4], which provide a relatively loose convergence rate. More recently, Weed and Bach [5] focused on the compact support set case and obtained a sharp convergence rate. Dereich et al. [6] conducted the almost optimal convergence analysis. However, they put some restrictions on the range of parameters. An interesting result was given in [1], which extends some results in [6] from a limited range of parameters to the general case. As for concentration estimates, only a few results are available. The corresponding results can be found in [7,8] under some strong assumptions. Moreover, it requires that the violation parameter is large enough. In [9], Zhao and Guan investigated the case with a discrete and bounded support set. Particularly, an elaborate result on the rate of convergence of data-driven Wasserstein distance was presented in [1].
As pointed out in [2] (p. 110), the Wasserstein metric is a rather strong probability metric. Intuitively, it needs harsh conditions to establish the strong Wasserstein type upper bound estimate. Actually, we know from the definition of the Wasserstein metric that its generator is the set of Lipschitz continuous functions with Lipschitz modulus being one.
Compared with the Wasserstein metric, FM metrics are more general; their generator is a class of locally Lipschitz continuous functions. Therefore, it is more friendly to obtain some upper bounds by adopting FM metrics. In view of this, FM metrics have been widely used in the quantitative stability analysis of stochastic programming problems when the underlying probability distribution is perturbed and approximated, see for example [10,11,12,13]. Moreover, FM metrics have a close relationship with Wasserstein metrics through dual representation (Kantorovich–Rubinstein theorem). Generally, the pth order FM metric degrades into the Wasserstein metric when p = 1 . From this point of view, the FM metric can be viewed as an extension of the Wasserstein metric. Nevertheless, there are few results concerning the convergence analysis for data-driven FM metrics. To the best of our knowledge, only Strugarek [14] examined the asymptotic convergence analysis under the FM distance.
In view of the above situations, in this article we study the data-driven FM metric. The main contributions of this study can be summarized as follows:
  • We establish the quantitative connection between the Wasserstein metric and the FM metric. Based on this connection, we investigate the non-asymptotic moment estimate, asymptotic convergence, and non-asymptotic concentration estimate for data-driven FM metrics.
  • We provide an alternative avenue for the convergence analysis of discrete approximations for two-stage stochastic programming problems. Different from the convergence or exponential rate of convergence analysis in [15,16], where some complex conditions are required, our approach is straightforward and brief.
  • We reestablish the quantitative stability results for stochastic optimization problems with stochastic dominance constraints through FM metrics. Compared with that in [17], our conditions are weaker and different probability metrics are adopted. More importantly, we can apply the convergence conclusion to examine the discrete approximation method which is crucial for numerical solution.
  • We consider data-driven distributionally robust optimization (DRO) problems with FM ball, which extends the results in [18] from the ambiguity set constructed by Wasserstein ball to the FM ball case. We prove the finite sample guarantee and asymptotic consistency, which lay the theoretical foundation for the data-driven approach for the DRO model.
  • We analyze the discrete approximation of the DRO problem whose ambiguity set is constructed with the general moment information. Compared with the existing work [19] under the bounded support set, we weaken their conditions and extend their results to the case with an unbounded support set.
The remainder of this study is organized as follows. In Section 2, we give some prerequisites for further discussion. In Section 3, we discuss different kinds of convergence results for data-driven FM metrics. We consider four applications to verify our convergence results and to further demonstrate the motivation and significance of this study in Section 4. Finally, we have some concluding remarks in Section 5.

2. Prerequisites

Let ξ : Ω Ξ R s be a random vector defined on the probability space ( Ω , F , P ) . Then, its induced probability distribution (sometimes it is called probability measure) on Ξ is P : = P ξ 1 . We use P ( Ξ ) to denote all the probability distributions on Ξ . The set of probability distributions having finite pth order absolute moments is denoted by P p ( Ξ ) : = { P P ( Ξ ) : Ξ ξ p P ( d ξ ) < + } .
Probability metrics measure the distance between two probability distributions. Generally, they do not satisfy the three axioms of usual distance in metric space. A commonly used class of probability metrics is the probability metric with ζ -structure, whose definition is as follows.
Definition 1.
Let G be a set of measurable functions from Ξ to R . Then, for any P , Q P ( Ξ ) ,
D G ( P , Q ) : = sup g G | E P [ g ( ξ ) ] E Q [ g ( ξ ) ] |
is called the ζ-structure probability metric induced by G .
The G in Definition 1 totally determines the resulting ζ -structure probability metric, so it is called the generator of the ζ -structure probability metric. FM metrics and Wasserstein metrics can be deduced from the ζ -structure probability metric by choosing specific generators. Particularly, we have the following definitions.
Definition 2.
Let P , Q P p ( Ξ ) for some p 1 and G FM p denote a set of locally Lipschitz continuous functions given by
g : Ξ R : | g ( ξ 1 ) g ( ξ 2 ) | max { 1 , ξ 1 , ξ 2 } p 1 ξ 1 ξ 2 , ξ 1 , ξ 2 Ξ .
Then, the pth order FM metric between P and Q is
ζ p ( P , Q ) : = sup g G FM p | E P [ g ( ξ ) ] E Q [ g ( ξ ) ] | .
Definition 3.
Let P , Q P 1 ( Ξ ) and
G W : = g : Ξ R : | g ( ξ 1 ) g ( ξ 2 ) | ξ 1 ξ 2 , ξ 1 , ξ 2 Ξ .
Then, the Wasserstein metric between P and Q is
D W ( P , Q ) : = sup g G W | E P [ g ( ξ ) ] E Q [ g ( ξ ) ] | .
It is easy to see from the above definitions that ζ 1 ( P , Q ) = D W ( P , Q ) for any P , Q P ( Ξ ) . Moreover, we have that: if g G FM p , then, g G FM p , so does G W . Therefore, we can ignore the absolute value operator in Definitions 2 and 3 when we take supremum. Moreover, both FM metrics and Wasserstein metrics have a close relationship with weak convergence. One can refer to [10] (p. 490) and [2] (Theorem 6.9) for more details.
The Wasserstein metric has an alternative definition which corresponds to the coupling marginal distributions. Specifically, the Wasserstein metric between P and Q is defined as (see [2], Definition 6.1):
D W ( P , Q ) = inf Ξ × Ξ ξ 1 ξ 2 π ( d ξ 1 , d ξ 2 ) : π Π ,
where Π is the collection of all joint distributions of ξ 1 and ξ 2 with marginal distributions P and Q, respectively. It is known from Kantorovich–Rubinstein theorem [20] that Definition 3 is the dual representation of (1).
We have the following extension theorem for Lipschitz functions in Hilbert space (see [21], Theorems 4 and 5).
Lemma 1.
Let X and Y be Hilbert spaces and g : B X Y be a Lipschitz function with Lipschitz modulus L g . Then, there exists a Lipschitz function g ^ : X Y such that g ^ ( x ) = g ( x ) for any x B and L g is also the Lipschitz modulus of g ^ .
Lemma 1 is important for the following discussion. In [1], the authors assumed that the support set Ξ is the whole space R s . They obtained the non-asymptotic moment estimate [1] (Theorem 1) and concentration estimate [1] (Theorem 2) for the Wasserstein metric. For any P , Q P ( Ξ ) , we can view them as probability distributions P ˜ , Q ˜ P ( R s ) through the following correspondence:
P ˜ ( A ) : = P ( A Ξ ) and Q ˜ ( A ) : = P ( Q Ξ )
for all A R s . That is, we set the probability of the area R s Ξ to be zero. Generally, we have D W ( P , Q ) = D W ( P ˜ , Q ˜ ) . The details are as follows:
D W ( P , Q ) = sup g G W ( Ξ ) | Ξ g ( ξ ) ( P Q ) ( d ξ ) | = sup g ^ G ^ W ( R s ) | R s g ^ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) | ,
where G W ( Ξ ) denotes the collection of all the Lipschitz continuous functions with Lipschitz modulus 1 on Ξ , and G ^ W ( R s ) is the extension of G W ( Ξ ) according to Lemma 1. Obviously, G ^ W ( R s ) G W ( R s ) which is the set of Lipschitz continuous functions with Lipschitz modulus 1 over R s . Thus, we have the estimation
sup g ^ G ^ W ( R s ) | R s g ^ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) | sup g ¯ G W ( R s ) | R s g ¯ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) | = D W ( P ˜ , Q ˜ ) .
That is, D W ( P , Q ) D W ( P ˜ , Q ˜ ) .
On the other hand, for any g ¯ G W ( R s ) , its restriction on Ξ is Lipschitz continuous with Lipschitz modulus 1. Thus,
sup g ¯ G W ( R s ) | R s g ¯ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) | = sup g ¯ G W ( R s ) | Ξ g ¯ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) + Ξ c g ¯ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) | = sup g ¯ G W ( R s ) | Ξ g ¯ ( ξ ) ( P ˜ Q ˜ ) ( d ξ ) | D W ( P , Q ) .
Finally, we have D W ( P , Q ) = D W ( P ˜ , Q ˜ ) . Therefore, although all the convergence results in [1] were derived under R s , we can extend them to any support set Ξ R s .
Lemma 2
([1], Theorem 1). Let P P p ( Ξ ) for some p > 1 . Then, there exists a constant C depending only on s (the dimension of Ξ) and p such that, for all N 1 ,
E [ D W ( P N , P ) ] C E P [ ξ p ] 1 / p × N 1 / 2 + N ( p 1 ) / p i f s = 1 a n d p 2 , N 1 / 2 log ( 1 + N ) + N ( p 1 ) / p i f s = 2 a n d p 2 , N 1 / s + N ( p 1 ) / p i f s > 2 a n d p s / ( s 1 ) ,
where log is the natural logarithm.
Lemma 2 cannot cover all the pairs ( s , p ) , for example, ( s , p ) = ( 1 , 2 ) or ( s , p ) = ( 2 , 2 ) . However, we can always reset p such that Lemma 2 holds by the following procedures. If s = 1 or 2 and p = 2 , P must belong to P q ( Ξ ) for any q ( 1 , 2 ) . If s > 2 and p = 2 , we can select q ( 1 , 2 ) such that q s / ( s 1 ) . If s > 2 and p = s / ( s 1 ) , we can choose any q ( 1 , s / ( s 1 ) ) . Then, we let p = q and have that Lemma 2 holds with s = 1 or 2 and p ( 1 , 2 ) or s > 2 and p s / ( s 1 ) . Therefore, Lemma 2 is applicable for any s N through carefully prepared p. In the following discussion, without loss of generality, we always assume that Lemma 2 holds for any pair ( s , p ) N × [ 1 , + ) . Further, we can, according to Lemma 2, obtain the following uniform upper bound:
E [ D W ( P N , P ) ] C E P [ ξ p ] 1 / p N 1 / max { 2 , s } log ( 1 + N ) + N ( p 1 ) / p
for any s N and N 1 .
Assumption 1.
Let P P ( Ξ ) satisfy
A : = E P exp ξ b = Ξ exp ξ b P ( d ξ ) <
for some constant b.
Lemma 3.
Suppose that Assumption 1 holds for some b > 1 . Then, we have for ϵ ( 0 , 1 ] that
P D W P , P N ϵ α × exp β N ϵ 2 i f s = 1 , exp β N ( ϵ / log ( 2 + 1 / ϵ ) ) 2 i f s = 2 , exp β N ϵ s i f s > 2
for all N 1 , where α and β are two positive constants depending only on P, b, and s.
Proof. 
Based on Assumption 1, we know that Condition (1) in [1] holds. Then, due to ϵ ( 0 , 1 ] , Lemma 3 directly follows from [1] (Theorem 2). □
For a more comprehensive version of Lemma 3, one can refer to [1] (Theorem 2). Here, we focus on the case ϵ ( 0 , 1 ] because it is more interesting for us to investigate a smaller violation rather than a bigger one. A simplified version can also be found in [18] (Theorem 3.4) where the assumption s 2 is imposed.
To simplify the following discussion, we derive a uniform upper bound for the right-hand side in Lemma 3. Note the fact that 1 + δ e δ for any δ R . We have
log 2 + 1 ϵ = 1 + log 2 + 1 ϵ log ( e ) = 1 + log 2 e + 1 e ϵ 2 e + 1 e ϵ .
Letting ϵ ( 0 , 1 / 2 ] gives us that
ϵ log ( 2 + 1 / ϵ ) 2 e 2 ϵ 4 4 .
When s = 2 , we have
P D W P , P N ϵ α exp β N ( ϵ / log ( 2 + 1 / ϵ ) ) 2 α exp e 2 β N ϵ 4 4 .
Moreover, for ϵ ( 0 , 1 / 2 ] ,
exp β N ϵ 4 exp e 2 β N ϵ 4 4 exp β N ϵ 2 .
Therefore, we can obtain a loose but uniform upper bound estimation
P D W P , P N ϵ α exp β N ϵ max { 4 , s }
for any ϵ ( 0 , 1 / 2 ] and s N .

3. Convergence Analyses of Data-Driven FM Metrics

In this section, we will investigate different kinds of convergence for data-driven FM metrics. To this end, let ξ 1 , ξ 2 , , ξ N be N iid samples generated according to P. These samples are viewed here as the random sample ξ i : Ω Ξ , 1 i N , on the probability space ( Ω , F , P ) . Then, we obtain the empirical distribution P N defined as
P N = 1 N i = 1 N 1 ξ i ,
where 1 ξ i ( · ) is the indicator function, that is, 1 ξ i ( ξ ) = 1 for ξ = ξ i and 1 ξ i ( ξ ) = 0 otherwise.
We first give the following vital lemma.
Lemma 4.
Let P , Q P p ( Ξ ) for some p 1 . Then,
ζ p ( P , Q ) R p 1 D W ( P , Q ) + 4 { ξ Ξ : ξ > R } ξ p ( P + Q ) ( d ξ )
for any R satisfying R 1 and B ( 0 , R ) Ξ . Here 0 is the original point in R s and B ( 0 , R ) is the closed ball centered at 0 with radius R.
The proof of Lemma 4 can be found in Appendix A.
If we define
ϕ P ( P , Q ) : = inf R R p 1 D W ( P , Q ) + 4 M R c ξ p ( P + Q ) ( d ξ ) : R 1 , B ( 0 , R ) Ξ ,
then, we can obtain a tighter upper bound estimation of ζ p ( P , Q ) , that is
ζ p ( P , Q ) ϕ P ( P , Q ) .
The first convergence result is about the non-asymptotic moment estimate. It provides an upper bound for the expectation of the FM distance between P and its empirical approximation distribution.
Theorem 1
(Non-asymptotic moment estimates for FM metrics). Suppose that P P p ( Ξ ) for some p > 1 . Then, for sufficiently large N, we have
E [ ζ p ( P , P N ) ] β N ,
where { β N } is a sequence of positive numbers satisfying β N 0 as N .
The proof of Theorem 1 can be found in Appendix A.
Theorem 1 establishes the convergence in the sense of expectation. However, it fails to tell us the sample-wise convergence. The following theorem states the asymptotic convergence under FM metrics for almost every sample.
Theorem 2
(Asymptotic convergence of FM metrics). Suppose that P P p ( Ξ ) . Then,
ζ p ( P , P N ) 0
with probability 1, as N . Here P N is defined at the beginning of this section.
The proof of Theorem 2 can be found in Appendix A.
Theorems 1 and 2 claim the convergence. As we know, the rate of convergence is quite important for guiding the solution process in practice. The following theorem gives the estimate of the convergence rate under certain assumptions.
Theorem 3
(Non-asymptotic concentration estimates for FM metrics). Suppose that P P p ( Ξ ) ( p 1 ) and Assumption 1 holds with b > p . Then, for any ϵ ( 0 , 1 / 2 ] , we have
P ( ζ p ( P , P N ) ϵ ) α ^ exp ( β ^ N )
for some constants α ^ > 0 depending on P, b, and s, and β ^ > 0 depending on P, b, s, and ϵ.
The proof of Theorem 3 can be found in Appendix A.
Remark 1.
Here we assume that ϵ ( 0 , 1 / 2 ] . The main reason is that we want to give a relatively simple proof. Fortunately, it is more interesting for us to consider a small violation rather than a large one.
Under certain assumptions, we can obtain an estimation for I ϵ 16 . For example, if M ( t ) exp σ 2 t 2 / 2 for t R , here σ is a positive constant, we have log M ( t ) σ 2 t 2 / 2 . Then, according to the properties of convex quadratic functions, the rate function has the lower bound
I ϵ 16 ϵ 2 512 σ 2 .
Thus, we can further obtain a concrete estimate for β ^ . For more details in this aspect, one can refer to [16].

4. Applications

In this section, we consider four applications of convergence conclusions about FM metrics obtained in Section 3. Specifically, we study the discrete approximation of two-stage stochastic programming problems, stochastic optimization problems with dominance constraints, data-driven distributionally robust optimization problems with FM ball, and the discrete approximation for distributionally robust optimization problems with general moment ambiguity set. They will not only further illustrate the motivations of this study but also provide alternative avenues or extensions for the current results.

4.1. Two-Stage Stochastic Linear Programming Problems

Discrete approximation is an important issue in stochastic optimization, which is crucial for its numerical solution. In this subsection, by employing the convergence results in Section 3, we give an alternative avenue for analyzing the discrete approximation of two-stage stochastic programming problems.
Consider the two-stage stochastic programming problem:
min x X c x + E P [ Φ ( x , ξ ) ] ,
where c R n ; X R n is a polyhedron; the probability measure P is supported on Ξ R s , which is a polyhedron; and
Φ ( x , ξ ) : = inf q ( ξ ) y ( ξ ) : W y ( ξ ) + T ( ξ ) x = h ( ξ ) , y ( ξ ) 0 .
Here W R r × m , T ( ξ ) R r × n , q ( ξ ) R m , h ( ξ ) R r . q ( ξ ) , T ( ξ ) and h ( ξ ) depend affine linearly on ξ .
Denote f ( x , ξ ) = c x + Φ ( x , ξ ) , and let v ( P ) and S ( P ) denote the optimal value and optimal solution set of Problem (4). Moreover, we use Pos W to denote the set { W y ^ : y ^ R + m } . Denote D = { u R m : { z R r : W z u } } .
To quantify the upper semicontinuity or the deviation distance of the optimal solution set, we define the growth function ψ P : R + R + as
ψ P ( τ ) : = min E P [ f ( x , ξ ) ] v ( P ) : d ( x , S ( P ) ) τ , x X .
Its inverse function ψ P 1 is given by
ψ P 1 ( t ) : = sup { τ R + : ψ P ( τ ) t } .
Thus, we can define the associated conditioning function Ψ P : R + R + as
Ψ P ( η ) = η + ψ P 1 ( 2 η ) .
It is easy to verify that ψ P is nondecreasing and Ψ P is increasing. Both ψ P and Ψ P are lower semicontinuous on R + and vanish at 0. One can refer to [10] for more details.
Moreover, we have ψ P 1 ( t ) 0 + as t 0 + . We illustrate this fact by contradiction. Suppose that there exists a sequence { t n } satisfying t n 0 + as n , such that ψ P 1 ( t n ) 0 + . Denote τ n = ψ P 1 ( t n ) . The lower semicontinuity of ψ P means that { τ R + : ψ P ( τ ) t n } is closed. Thus, ψ P ( τ n ) t n . Due to the nondecreasing property of ψ P and t n 0 + as n , { τ n } must be bounded. Without loss of generality, we assume that τ n τ * as n , where τ * is a positive constant. According to the lower semicontinuity of ψ P , we have
0 = lim inf n t n lim inf n ψ P ( τ n ) ψ P ( τ * ) > 0 ,
which leads to a contradiction.
According to the definition of Ψ P , we can immediately deduce that Ψ P ( η ) 0 + as η 0 + .
To introduce the following discussion, we make some standard assumptions (see [11]).
Assumption 2.
Let the following assertions hold:
(1) 
For each pair ( x , ξ ) X × Ξ , h ( ξ ) T ( ξ ) x Pos W and q ( ξ ) D ;
(2) 
P P 2 ( Ξ ) .
Under the above assumptions, we have the following quantitative stability results about the optimal value and optimal solution set of Problem (4).
Lemma 5
([11], Theorem 3.3). Suppose that Assumption 2 holds and S ( P ) is nonempty and bounded. Then, there exist constants L > 0 and δ > 0 such that
| v ( P ) v ( Q ) | L ζ 2 ( P , Q ) , S ( Q ) S ( P ) + Ψ P ( L ζ 2 ( P , Q ) ) B
when Q P 2 ( Ξ ) and ζ 2 ( P , Q ) < δ , where B is the closed unit ball in R n .
Based on Lemma 5 and the convergence results in Section 3, we have the following convergence conclusions between the two-stage stochastic programming problem (4) and its empirical approximation.
Theorem 4.
Suppose that: (i) Assumption 2 holds; (ii) S ( P ) is nonempty and bounded. Then,
| v ( P ) v ( P N ) | 0 , d ( S ( P N ) , S ( P ) ) 0
with probability 1, as N .
Proof. 
For the first assertion, we have from Theorem 2 that ζ 2 ( P , P N ) 0 with probability 1. This means that: for the δ defined in Lemma 5, there exists a positive number N 0 = N 0 ( δ , ω ) such that for any N N 0 , ζ 2 ( P , P N ) δ for almost every ω Ω . Then, by Lemma 5, we have that
| v ( P ) v ( P N ) | L ζ 2 ( P , P N ) , d ( S ( P N ) , S ( P ) ) Ψ P ( L ζ 2 ( P , P N ) )
hold almost surely as N N 0 , here L is defined in Lemma 5. According to Theorem 2 and the property of Ψ P , we have
ζ 2 ( P , P N ) 0
and thus
Ψ P ( L ζ 2 ( P , P N ) ) 0
with probability 1, as N . These facts imply that
| v ( P ) v ( P N ) | 0 , d ( S ( P N ) , S ( P ) ) 0
with probability 1, as N . □
Theorem 5.
Suppose that: (i) Assumption 1 holds with b > 2 ; (ii) Assumption 2 holds; (iii) S ( P ) is nonempty and bounded. Then, for any ϵ ( 0 , 1 / 2 ] , there exist α ¯ > 0 depending on P and s, and β ¯ > 0 depending on P, s and ϵ, such that
P | v ( P ) v ( P N ) | L ϵ α ¯ exp ( β ¯ N ) , P d ( S ( P N ) , S ( P ) ) Ψ P ( L ϵ ) α ¯ exp ( β ¯ N ) .
Proof. 
If | v ( P ) v ( P N ) | L ζ 2 ( P , P N ) for L defined in Lemma 5, we have from Theorem 3 that
P | v ( P ) v ( P N ) | L ϵ P ζ 2 ( P , P N ) ϵ α ¯ exp ( β ¯ ( ϵ ) N )
for any ϵ ( 0 , 1 / 2 ] , where α ¯ > 0 depends on P and s, and β ¯ > 0 depends on P, s and ϵ . Here we use β ¯ ( ϵ ) to stress its dependence on ϵ .
As shown in Theorem 4, a sufficient condition for
| v ( P ) v ( P N ) | L ζ 2 ( P , P N )
is ζ 2 ( P , P N ) < δ , where δ is defined in Lemma 5. Without loss of generality, we assume that δ ( 0 , 1 / 2 ] . Analogously, we have that
P ζ 2 ( P , P N ) < δ 1 α ¯ exp ( β ¯ ( δ ) N ) .
Then, we obtain
P | v ( P ) v ( P N ) | L ϵ α ¯ exp ( β ¯ ( ϵ ) N ) · 1 α ¯ exp ( β ¯ ( δ ) N ) α ¯ exp ( β ¯ ( ϵ ) N ) .
Similarly, we have
P d ( S ( P N ) , S ( P ) ) Ψ P ( L ϵ ) P Ψ P ( L ζ 2 ( P , P N ) ) Ψ P ( L ϵ ) = P ζ 2 ( P , P N ) ϵ ,
where the equality follows from the strictly increasing property of Ψ P ( · ) . By the same procedure, we can derive the second assertion. □
Remark 2.
The convergence analysis about two-stage stochastic programming problems can also be found in [11] (Section 4), where the covering and bracketing numbers are introduced. However, it seems difficult to verify the growth rate of the covering or bracketing number in the general case (see [11], Proposition 4.2). Our convergence results are more straightforward. Compared with [11] (Proposition 4.2), instead of the growth rate of the covering or bracketing number, we use the light-tailed distribution assumption. This assumption is commonly used in the literature, see for example [1,18].

4.2. Stochastic Optimization Problems with Stochastic Dominance Constraints

In this part, we consider stochastic optimization problems with stochastic dominance constraints. Stochastic dominance is an important ingredient in economics, decision theory, statistics, and nowadays in modern optimization. It has been widely studied in the last two decades, see for example [17,22,23,24,25,26] and their references therein. Different from classical stochastic optimization models which cope with random variables by taking expectation, stochastic dominance can better reflect the relationship between two random variables. It is known that expected utility theory can also provide the comparison of two random variables. However, it is hardly possible for us to explicitly express the utility functions of decision makers [27]. From this point of view, stochastic dominance is more friendly in practice. Actually, stochastic dominance has a close relationship with expected utility theory. Generally, a random variable X dominates another random variable Y in the kth ( k 1 ) order, denoted by X ( k ) Y , if E [ u ( X ) ] E [ u ( Y ) ] for every nondecreasing function u ( · ) from a certain set of utility functions [17]. Specially, X ( 1 ) Y if and only if E [ u ( X ) ] E [ u ( Y ) ] for every nondecreasing utility function u ( · ) . X ( 2 ) Y if and only if E [ u ( X ) ] E [ u ( Y ) ] for every nondecreasing and concave utility function u ( · ) [27].
The convex stochastic optimization model with the kth order stochastic dominance constraint can be described as (see [22,27]):
min { f ( x ) : x D , G ( x , ξ ) ( k ) Y } ,
where D is a nonempty closed and convex subset of R n ; f : R n R is a convex function; Y is a random variable supported on Y R , which can be treated as the random benchmark; and G : R n × Ξ R . Moreover, we assume that G is locally Lipschitz continuous with respect to ξ in the following sense:
| G ( x , ξ ) G ( x , ξ ) | L G max 1 , ξ , ξ p 1 ξ ξ
for any ξ , ξ Ξ , where p 1 and L G > 0 . G satisfies the linear growth condition:
| G ( x , ξ ) | C G ( B ) max { 1 , ξ }
for every x B and ξ Ξ , where B is any bounded subset of R n , and C G ( B ) > 0 depends on B.
Actually, we can impose a more general growth condition on G, for example,
| G ( x , ξ ) | C G ( B ) max { 1 , ξ } q
for q 1 and the following discussion still holds. Here a linear growth condition simplifies the demonstration. The above requirements for G ( x , ξ ) can be met easily. For instance, the objective function of the two-stage stochastic programming problem with fixed recourse satisfies the above conditions (see [11], Proposition 3.2).
Due to its attractive modeling technique, the quantitative stability analysis of stochastic optimization models with dominance constraints has been recently investigated in several works. Dencheva et al. first studied in [22] stochastic optimization problems with first order stochastic dominance constraints, which was extended by Dencheva and Römisch in [17] to the problem with general kth ( k 2 ) order stochastic dominance constraints. In [24], Chen and Jiang weaken the assumptions of the quantitative stability analysis in [17] by considering the case that G ( x , ξ ) is generated by the two-stage fully random stochastic programming problem.
To establish the convergence results, we first investigate the quantitative stability of model (6). By convention, we consider its relaxed problem (see also [17,24]):
min f ( x ) : x D , E [ ( η G ( x , ξ ) ) + k 1 ] E [ ( η Y ) + k 1 ] for η I ,
where I R is a compact interval: ( · ) + : = max { 0 , · } .
In view of our focus in this study, we reestablish the quantitative stability conclusions of Problem (9) in what follows. We use P ξ and P Y to denote the probability distributions of ξ and Y, respectively. We denote the feasible solution set of Problem (9) by
X ( P ξ , P Y ) = x D : E P ξ [ ( η G ( x , ξ ) ) + k 1 ] E P Y [ ( η Y ) + k 1 ] for η I
and its perturbed feasible solution set under ( Q ξ , Q Y ) by
X ( Q ξ , Q Y ) = x D : E Q ξ [ ( η G ( x , ξ ) ) + k 1 ] E Q Y [ ( η Y ) + k 1 ] for η I .
First we examine the quantitative stability of the feasible solution set.
Proposition 1.
Let D be compact, P ξ P k + p 2 ( Ξ ) , P Y P k 1 ( Y ) , and G satisfy the locally Lipschitz continuity condition (7) and the linear growth condition (8). Then, there exist constants L > 0 and δ > 0 such that
d H ( X ( P ξ , P Y ) , X ( Q ξ , Q Y ) ) L ( ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) )
whenever Q ξ P k + p 2 ( Ξ ) , Q Y P k 1 ( Y ) and the pair ( Q ξ , Q Y ) satisfies ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) < δ , where d H ( · , · ) denotes the Pompeiu–Hausdorff distance.
Proof. 
We know from the proof of [17] (Proposition 3.2) that
d H ( X ( P ξ , P Y ) , X ( Q ξ , Q Y ) ) 1 ( k 1 ) ! max η I | E P ξ η G ( x , ξ ) + k 1 E Q ξ η G ( x , ξ ) + k 1 | + 1 ( k 1 ) ! max η I | E P Y η Y + k 1 E Q Y η Y + k 1 |
whenever the right-hand side is less than or equal to some positive scalar δ ¯ .
In view of this, we estimate
max η I | E P ξ η G ( x , ξ ) + k 1 E Q ξ η G ( x , ξ ) + k 1 |
and
max η I | E P Y η Y + k 1 E Q Y η Y + k 1 | ,
respectively.
Note the fact that (see [17], (3.9))
| ( η t ) + k 1 ( η t ^ ) + k 1 | K I ( k 1 ) max { 1 , | t | , | t ^ | } k 2 | t t ^ |
for some positive constant K I and any η I . Then, we have
| η G ( x , ξ ) + k 1 η G ( x , ξ ^ ) + k 1 | K I ( k 1 ) max 1 , | G ( x , ξ ) | , | G ( x , ξ ^ ) | k 2 | G ( x , ξ ) G ( x , ξ ^ ) | K I ( k 1 ) max 1 , C G ( D ) max { 1 , ξ } , C G ( D ) max 1 , ξ ^ k 2 · L G max { 1 , ξ , ξ ^ } p 1 ξ ξ ^ K I ( k 1 ) ( C G ( D ) + 1 ) L G max 1 , ξ , ξ ^ k + p 3 ξ ξ ^ .
This means that
max η I | E P ξ η G ( x , ξ ) + k 1 E Q ξ η G ( x , ξ ) + k 1 | L 1 ζ k + p 2 ( P ξ , Q ξ ) ,
where L 1 : = K I ( k 1 ) ( C G ( D ) + 1 ) L G .
Similarly, we have
| η Y + k 1 η Y ^ + k 1 | K I ( k 1 ) max 1 , | Y | , | Y ^ | k 2 | Y Y ^ | ,
which means that
max η I | E P Y η Y + k 1 E Q Y η Y + k 1 | K I ( k 1 ) ζ k 1 ( P Y , Q Y ) .
Taking L : = 1 ( k 1 ) ! max { L 1 , K I ( k 1 ) } , we have
d H ( X ( P ξ , P Y ) , X ( Q ξ , Q Y ) ) L ( ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) ) ,
whenever ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) δ : = δ ¯ L . □
The quantitative stability result in Proposition 1 differs in two perspectives from the corresponding results in [17]. One is the locally Lipschitz continuity of G; the other is the probability metric we choose. In [17], the authors assumed that G is Lipschitz continuous, and adopted Rachev metrics and the ( k 1 ) th order Wasserstein metric. As far as we know, there does not exist data-driven results under Rachev metrics.
Let v ( P ξ , P Y ) and S ( P ξ , P Y ) denote the optimal value and optimal solution set of Problem (9), respectively. Similar to that in Section 4.1, we can define the growth function of Problem (9) as
ψ P ξ , P Y ( τ ) = inf { f ( x ) v ( P ξ , P Y ) : d ( x , S ( P ξ , P Y ) ) τ , x X ( P ξ , P Y ) } .
Then, its inverse function and the associated conditioning function are
ψ P ξ , P Y 1 ( t ) : = sup { τ R + : ψ P ξ , P Y ( τ ) t }
and
Ψ P ξ , P Y ( η ) : = η + ψ P ξ , P Y 1 ( 2 η ) .
Proposition 2.
Under the conditions of Proposition 1, there exist constants L ^ > 0 and δ ^ > 0 such that
| v ( P ξ , P Y ) v ( Q ξ , Q Y ) | L ^ ( ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) ) , d ( S ( Q ξ , Q Y ) , S ( P ξ , P Y ) ) Ψ P ξ , P Y ( L ^ ( ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) ) )
whenever ζ k + p 2 ( P ξ , Q ξ ) + ζ k 1 ( P Y , Q Y ) < δ ^ .
Proof. 
Since f is convex, f is locally Lipschitz continuous. Since D is compact, f is in fact Lipschitz continuous over D. Then, the assertions follow from a similar proof as that for [17] (Theorem 3.3). □
Now we consider the iid samples of ξ and Y. For convenience, we assume that the samples drawn from ξ and Y have the same sample size N. The N iid samples of ξ are ξ 1 , ξ 2 , , ξ N and the N iid samples of Y are Y 1 , Y 2 , , Y N . Then, we have the following empirical distributions:
P ξ , N = 1 N i = 1 N 1 ξ i ,
and
P Y , N = 1 N i = 1 N 1 Y i .
With these preparations, we can establish the following convergence results.
Theorem 6.
Let D be compact, P ξ P k + p 2 ( Ξ ) , P Y P k 1 ( Y ) , and G satisfy the locally Lipschitz continuity condition (7) and the linear growth condition (8).
(i) 
We have
d H ( X ( P ξ , P Y ) , X ( P ξ , N , Q Y , N ) ) 0
and
| v ( P ξ , P Y ) v ( P ξ , N , Q Y , N ) | 0 , d ( S ( P ξ , N , Q Y , N ) , S ( P ξ , P Y ) ) 0
with probability 1, as N .
(ii) 
If, moreover,
E P ξ exp ξ b = Ξ exp ξ b P ξ ( d ξ ) < +
and
E P Y exp | Y | c = Y exp | y | c P Y ( d y ) < +
for some b > k + p 2 and c > k 1 , then, for ϵ ( 0 , 1 / 2 ] , there exist positive scalers α 1 depending on P ξ , b and s; β 1 depending on P ξ , b, s and ϵ; α 2 depending on P Y and c, and β 2 depending on P Y , c and ϵ, such that
P d H ( X ( P ξ , P Y ) , X ( P ξ , N , Q Y , N ) ) L ϵ α 1 exp ( β 1 N ) + α 2 exp ( β 2 N )
and
P | v ( P ξ , P Y ) v ( P ξ , N , Q Y , N ) | L ^ ϵ α 1 exp ( β 1 N ) + α 2 exp ( β 2 N ) , P d ( S ( P ξ , N , Q Y , N ) , S ( P ξ , P Y ) ) Ψ P ξ , P Y ( L ^ ϵ ) α 1 exp ( β 1 N ) + α 2 exp ( β 2 N ) ,
where L and L ^ are defined in Propositions 1 and 2, respectively.
Proof. 
Part (i) can be similarly proved as that in Theorem 4 by utilizing Theorem 2 and Proposition 1.
For Part (ii), we have
P d H ( X ( P ξ , P Y ) , X ( P ξ , N , P Y , N ) ) L ϵ P ζ k + p 2 ( P ξ , P ξ , N ) + ζ k 1 ( P Y , P Y , N ) ϵ P ζ k + p 2 ( P ξ , P ξ , N ) ϵ 2 + P ζ k 1 ( P Y , P Y , N ) ϵ 2 α 1 exp ( β 1 N ) + α 2 exp ( β 2 N ) ,
where the last inequality follows from Theorem 3; α 1 depends on P ξ , b, and s; β 1 depends on P ξ , b, s, and ϵ ; α 2 depends on P Y and c; and β 2 depends on P Y , c, and ϵ .
The second and third probability inequalities can be analogously verified and thus we omit the proof here. □

4.3. Data-Driven DRO Problems with FM Ball

A general stochastic optimization model can be formulated as
min x X E P [ g ( x , ξ ) ] ,
where g : X × Ξ R , X R n , and Ξ R s is the support set of ξ . The sample average approximation (SAA) is usually used to solve Problem (10) numerically. The SAA method acquiescently assumes that we can generate any number of samples based on P. To better approximate Problem (10), a large sample size is needed [28]. However, in practice, the true probability distribution P cannot be known exactly, and thus we cannot generate a sufficiently large number of samples to make the SAA method well-defined, due to the expensive cost for more samples. However, it is possible for us to obtain a limited number of samples or scenarios, such as historical data. Under these settings, the data-driven DRO model is proposed [18,29,30]. The natural idea is to use the partial information to construct an ambiguity set such that the true probability distribution is included in the ambiguity set. As pointed out in [18], under certain conditions, it offers powerful out-of-sample performance guarantees.
For further discussion, we denote the limited finite samples by ξ 1 , ξ 2 , , ξ N and the corresponding empirical distribution by P N . Since the number of samples N is limited, we cannot adopt the classical SAA method, which requires that the sample size tends to infinity. However, we can use the limited information to construct a set of probability measures which contains the true one, that is, the ambiguity set. In this subsection, we consider the following FM ball-based ambiguity set:
B r ( P N ) : = { Q P ( Ξ ) : ζ p ( Q , P N ) r } ,
where p 1 , the positive constant r stands for the confidence parameter determined by the decision maker. Then, we have the data-driven DRO problem with the FM ball-based ambiguity set of Problem (10) as follows:
min x X sup Q B r ( P N ) E Q [ g ( x , ξ ) ] .
It is common for us to see that the Wasserstein ball is used to build the ambiguity set, for example, [18]. To further explain the reasonability and motivations for us to adopt the FM metric, we have the following comments.
Remark 3.
As we know, a key issue for DRO problems is how to build the ambiguity set. Different kinds of ambiguity sets have been proposed, such as moment information [31], ζ-ball [32], and so on. Of course, the FM metric, as a specific case of the ζ-structure probability metric, can be employed to construct the ambiguity set.
More importantly, the decision maker can utilize the limited empirical distribution P N to obtain an approximate optimal value, say v ( P N ) . By prior experience, the decision maker usually has some confidence, measured by the derivation constant r ^ > 0 , that the true optimal value, denoted by v ( P ) , locates in the interval [ v ( P N ) r ^ , v ( P N ) + r ^ ] . Frequently, g ( x , ξ ) is locally Lipschitz continuous in the following sense:
| g ( x , ξ ) g ( x , ξ ) | L max 1 , ξ , ξ p 1 ξ ξ
for some positive constant L. A typical example is that g ( x , ξ ) is the objective function of the two-stage stochastic programming problem, here p = 2 (see [11], Proposition 3.2). Then, we have the quantitative relationship:
| v ( P ) v ( P N ) | L ζ p ( P , P N ) .
Therefore, it is reasonable for the decision maker to consider the ambiguity set
{ Q P ( Ξ ) : ζ p ( Q , P N ) r : = r ^ / L } .
Moreover, since ζ p ( P , P N ) 0 with probability 1, as N (see Theorem 2), P must be included in { Q P ( Ξ ) : ζ p ( Q , P N ) r } for suitable N and r.
Finally, we have ζ p ( P , Q ) D W ( P , Q ) . The equality holds if p = 1 . Thus,
{ Q P ( Ξ ) : ζ p ( Q , P N ) r } { Q P ( Ξ ) : D W ( Q , P N ) r } .
This tells us that the ambiguity set constructed by the FM ball is tighter than that constructed with the Wasserstein ball.
All these arguments motivate us to consider the data-driven DRO problem with the FM ball-based ambiguity set.
We use v * and v N to denote the optimal values of Problems (10) and (11), respectively.
To quantify the out-of-sample performance of the data-driven DRO problem (11), we examine the following probability
P E P [ g ( x N , ξ ) ] v N ,
where x N is any optimal solution of Problem (11). Of course, we hope that, for sufficiently small ϵ > 0 , there exists a finite positive integer N 0 such that
P E P [ g ( x N , ξ ) ] v N 1 ϵ
for any N N 0 .
If P satisfies ζ p ( P , P N ) < r , we have P B r ( P N ) . Thus,
E P [ g ( x , ξ ) ] sup Q B r ( P N ) E Q [ g ( x , ξ ) ]
for any x X , which of course implies that
E P [ g ( x N , ξ ) ] sup Q B r ( P N ) E Q [ g ( x N , ξ ) ] = v N .
From Theorem 3, we have for any r ( 0 , 1 / 2 ] that
P ( ζ p ( P , P N ) r ) α ^ exp ( β ^ ( r ) N ) ,
here we use the notation β ^ ( r ) > 0 to stress the dependence of β ^ on r. Consequently,
P ( ζ p ( P , P N ) r ) P ( ζ p ( P , P N ) < r ) 1 α ^ exp ( β ^ ( r ) N ) .
A sufficient condition to satisfy (12) is
1 α ^ exp ( β ^ ( r ) N ) 1 ϵ ,
which is equivalent to
N 1 β ^ ( r ) log α ^ ϵ .
Denote
N 0 = 1 β ^ ( r ) log α ^ ϵ ,
where · stands for rounding up to an integer. Sometimes, we use the notation N 0 ( ϵ , r ) to stress the dependence of N 0 on ϵ and r.
Summarizing the above discussions, we obtain the following so-called finite sample guarantee property (see also [18,30]).
Proposition 3
(Finite sample guarantee). Let P P p ( Ξ ) ( p 1 ) and Assumption 1 hold for some b > p . For any r ( 0 , 1 / 2 ] , ϵ ( 0 , 1 ) and N 0 defined in (13), we have that (12) holds for every N N 0 .
Proposition 3 tells us, for the fixed confidence parameter r, at least how large the sample size should be to ensure the significance level ϵ . Now we slightly modify model (11) and consider the following data-driven DRO problem:
min x X sup Q B r N ( P N ) E Q [ g ( x , ξ ) ] ,
where r N > 0 and r N 0 as N . It reflects the natural fact that the decision maker becomes more confident with more information. Meanwhile, the model (11) emphasizes the fixed limited information. We use v ^ N to denote the optimal value of Problem (14). In what follows, we investigate the asymptotic consistency whenever N tends to infinity. To this end, we need the following lemma.
Lemma 6.
Let { A N } and { B N } be two sequences of random variables defined on the probability space ( Ω , F , P ) . If { B N } converges almost surely and
P A N B N 1 κ N
for N N , where κ N ( 0 , 1 ) with N = 1 κ N < + , we have
P lim inf N A N lim N B N = 1 .
Proof. 
We prove by contradiction. That is, we assume that
P lim inf N A N lim N B N < 1 .
This implies that there exists a subset Ω ^ Ω with P ( Ω ^ ) > 0 such that
lim inf N A N ( ω ^ ) > lim N B N ( ω ^ )
for every ω ^ Ω ^ . Define the sequence { Ω N } as
Ω N = ω Ω : A N ( ω ) > B N ( ω )
for N N . Obviously, according to (15), we have P ( Ω N ) κ N , which implies that
N = 1 P ( Ω N ) N = 1 κ N < + .
Then, we can always choose a sufficiently large N ^ N , such that
N = N ^ P ( Ω N ) < P ( Ω ^ ) .
Choose
ω Ω ^ N = N ^ Ω N ,
and we have
A N ( ω ) B N ( ω )
for all N N ^ , which implies that
lim inf N A N ( ω ) lim N B N ( ω ) .
This contradicts the definition of Ω ^ . We complete the proof. □
The following proposition states that the optimal value and optimal solution set of the data-driven DRO problem (14) converge to those of the original Problem (10), which verifies the reasonability of our data-driven DRO model (14).
Proposition 4
(Asymptotic consistency). Suppose that g ( x , ξ ) is locally Lipschitz continuous in the following sense:
| g ( x , ξ ) g ( x , ξ ) | L max { 1 , ξ , ξ } p 1 ξ ξ
for every x X , where L > 0 and p 1 . Let x N be any optimal solution of Problem (14). Then, the following assertions hold:
(i) 
v ^ N v * with probability 1, as N ;
(ii) 
If, moreover, X is closed, g ( · , ξ ) is lower semicontinuous for every ξ Ξ and g ( x , ξ ) dominates some P-integrable function uniformly with respect to x X , then, any accumulation point of { x N } is an optimizer of Problem (10) almost surely.
Proof. 
Part (i): Notice that
| v ^ N v * | = | min x X sup Q B r N ( P N ) E Q [ g ( x , ξ ) ] min x X E P [ g ( x , ξ ) ] | max { sup Q B r N ( P N ) E Q [ g ( x * , ξ ) ] E P [ g ( x * , ξ ) ] , E P [ g ( x N , ξ ) ] sup Q B r N ( P N ) E Q [ g ( x N , ξ ) ] } ,
where x * and x N are any optimal solutions of Problems (10) and (14), respectively. For the first term on the right-hand side, we have
sup Q B r N ( P N ) E Q [ g ( x * , ξ ) ] E P [ g ( x * , ξ ) ] E Q N [ g ( x * , ξ ) ] E P [ g ( x * , ξ ) ] + δ N L ζ p ( Q N , P ) + δ N L r N + δ N 0 ,
almost surely, as N , where the first inequality is due to the definition of supremum for some δ N > 0 with δ N 0 almost surely and Q N B r N ( P N ) ; the second inequality follows from the definition of FM metric. Similarly, we can derive
E P [ g ( x N , ξ ) ] sup Q B r N ( P N ) E Q [ g ( x N , ξ ) ] 0
almost surely, as N . Thus, we obtain that
| v ^ N v * | 0 almost surely , as N .
Part (ii): Without loss of generality, in the following discussion, we assume x N x ^ with probability 1 as N . Moreover, we select a sequence { ϵ k } with ϵ k ( 0 , 1 ) and k = 1 ϵ k < . According to (12), for each pair ( ϵ k , r k ) with r k defined in (14), we can select an N k N 0 ( ϵ k , r k ) ( N 0 ( ϵ k , r k ) is defined in (13)) such that
P E P [ g ( x N k , ξ ) ] v ^ N k 1 ϵ k .
We know from Lemma 6 and assertion (i) that
P lim inf k E P [ g ( x N k , ξ ) ] lim k v ^ N k = v * = 1 .
Then, the following inequalities hold almost surely:
v * ( a ) E P [ g ( x ^ , ξ ) ] ( b ) E P lim inf k g ( x N k , ξ ) ( c ) lim inf k E P g ( x N k , ξ ) ( d ) v * ,
where (a) follows from x ^ X due to the closedness of X; (b) follows from the lower semicontinuity of g ( · , ξ ) for every ξ Ξ ; (c) is due to Fatou’s lemma; (d) follows from (16). □
Remark 4.
Propositions 3 and 4 establish the finite sample guarantee and the asymptotic consistency, which are two desirable properties of the data-driven DRO problem [18,30]. Different from the existing results in [18] where the Wasserstein ball is used to construct the ambiguity set, we adopt the FM ball. Due to the feature of Wasserstein metric, to ensure the existence of the significance parameter ϵ, they explicitly derived the radius r ( ϵ , N ) depending on ϵ and N and the finite sample size N 0 ( ϵ ) depending only on ϵ. In Proposition 3, we view both r and ϵ as parameters because β ^ couples with ϵ implicitly in Theorem 3. Moreover, the assumptions for the asymptotic consistency (Proposition 4) are different from those in [18] (Theorem 3.6), where the upper semicontinuity and linear growth were employed. Here we use the locally Lipschitz continuity but a weaker assumption of the lower bound. Specially, Ref. [18] (Theorem 3.6) employs Borel–Cantelli lemma to obtain
P E P [ g ( x N , ξ ) ] v ^ N f o r s u f f i c i e n t l y l a r g e N = 1 .
This is not applicable for our case, so we need Lemma 6.

4.4. Discrete Approximation for DRO Problems with General Moment Information

We consider the following general DRO problem:
min x X sup P P E P [ h ( x , ξ ) ] ,
where X R n is a compact set, h : X × Ξ R , P : = { P P ( Ξ ) : E P [ Γ ( ξ ) ] K } , K is a closed and convex set in the Cartesian product of some finite dimensional vector and/or matrix spaces, and Γ is a general mapping on Ξ . We implicitly assume that, for each x X , E P [ h ( x , ξ ) ] < + for all P P .
The above ambiguity set P is very general, and it covers almost all the available ambiguity sets with moment information (see, e.g., [19], Examples 3–5). Zhang et al. discussed in [33] the quantitative stability of the DRO problem with a general moment information ambiguity set. There are usually two ways to numerically solve Problem (17): One is to use some kind of duality argument to reformulate Problem (17) as a solvable Problem [18,31]; the other is to discretize the ambiguity set, which leads to a saddle point problem in the finite dimensional space [19]. For instance, the discrete approximation in [19] is conducted under a bounded support set. In this part, by employing our results in Section 3, we consider the discrete approximation for problem (17) under weaker conditions.
Denote by P ^ N the collection of all discrete distributions which have at most N supporting elements, that is,
P ^ N = i = 1 N p i 1 ξ i : i = 1 N p i = 1 , p j 0 , ξ j Ξ , j = 1 , 2 , , N .
We define the discrete approximation of P as
P N = { Q P ^ N : E Q [ Γ ( ξ ) ] K } .
Obviously, P N P . Then, the discrete approximation of Problem (17) can be written as
min x X sup P P N E P [ h ( x , ξ ) ] .
We use v ( P ) and S ( P ) to denote the optimal value and optimal solution set of Problem (17). v ( P N ) and S ( P N ) are the optimal value and optimal solution set of Problem (18). To make sense of the discrete approximation, we hope that Problem (18) can approximately solve Problem (17) when N is sufficiently large.
To continue the following discussion, we define the growth function of Problem (17) ψ P : R + R + as
ψ P ( τ ) = inf sup P P E P [ h ( x , ξ ) ] v ( P ) : d ( x , S ( P ) ) τ , x X
and its inverse function is
ψ P 1 ( t ) : = sup { τ R + : ψ P ( τ ) t } .
Thus, the associated conditioning function Ψ P : R + R + is defined as
Ψ P ( η ) : = η + ψ P 1 ( 2 η ) .
Immediately, we have the following quantitative stability results:
Proposition 5.
Suppose that: (i) P P p ( Ξ ) ; (ii) h ( x , ξ ) g ( ξ ) for each ξ Ξ and a measurable function g : Ξ R with E P [ g ( ξ ) ] > for any P P ; (iii) h ( · , ξ ) is lower semicontinuous for each ξ Ξ ; (iv)
| h ( x , ξ ) h ( x , ξ ) | L h max 1 , ξ , ξ p 1 ξ ξ
for each x X . Then, S ( P ) and
| v ( P ) v ( P N ) | L h ζ p ( P , P N ) , S ( P N ) S ( P ) + Ψ P ( L h ζ p ( P , P N ) ) .
Proof. 
Since h ( x , ξ ) g ( ξ ) and h ( · , ξ ) is lower semicontinuous for each ξ Ξ , we have from Fatou’s lemma that
lim inf k E P [ h ( x k , ξ ) ] E P lim inf k h ( x k , ξ ) E P [ h ( x ¯ , ξ ) ]
holds for any { x k } X such that lim k x k = x ¯ . This implies that E P [ h ( · , ξ ) ] is lower semicontinuous. According to [34] (Lemma 4.1), sup P P E P [ h ( · , ξ ) ] is lower semicontinuous too. This, together with the compactness of X, ensures that S ( P ) . Similarly, we can prove that S ( P N ) .
Note that
| v ( P ) v ( P N ) | = ( a ) min x X sup P P E P [ h ( x , ξ ) ] min x X sup Q P N E Q [ h ( x , ξ ) ] max x X sup P P E P [ h ( x , ξ ) ] sup Q P N E Q [ h ( x , ξ ) ] = max x X sup P P inf Q P N E P [ h ( x , ξ ) ] E Q [ h ( x , ξ ) ] ( b ) max x X sup P P inf Q P N L h ζ p ( P , Q ) = L h ζ p ( P , P N ) ,
where (a) follows from the fact P N P ; (b) is due to the definition of the pth order FM metric.
Finally, based on the first assertion, the inclusion for the optimal solution sets can be analogously derived as that in [11]. □
For simplicity as well as to show the linear relationship more clearly, we write E P [ Γ ( ξ ) ] as P , Γ ( ξ ) in what follows. We need the following technical assumption to proceed.
Assumption 3
(see [19]). The system P : = { P : E P [ Γ ( ξ ) ] K } satisfies the following Slater condition:
P ˜ , Γ ( ξ ) + δ B K
for some P ˜ P ( Ξ ) and δ > 0 .
Proposition 6.
Suppose that Assumption 3 holds and P P p ( Ξ ) . Then, there exists an Ω 0 Ω with P ( Ω 0 ) = 0 , such that for any δ ^ < δ and ω Ω Ω 0 , we have
ζ p ( Q , P 2 N ) ζ p ( Q , P ˜ ) + 1 δ ^ inf K K K Q , Γ ( ξ )
for any Q P ^ N and N N ^ ( δ ^ , ω ) , where N ^ ( δ ^ , ω ) is a positive integer depending on δ ^ and ω.
Proof. 
Let the empirical approximation of P ˜ be P ˜ N . Then, we have from the law of large numbers that
P ˜ N , Γ ( ξ ) P ˜ , Γ ( ξ )
with probability 1, as N . Equivalently, there exists an Ω 1 Ω with P ( Ω 1 ) = 0 , such that for any δ ^ < δ and ω Ω Ω 1 , we have
P ˜ N ( ω ) , Γ ( ξ ) P ˜ , Γ ( ξ ) δ δ ^
for N N 1 ( δ ^ , ω ) . This implies that
P ˜ N ( ω ) , Γ ( ξ ) P ˜ , Γ ( ξ ) + ( δ δ ^ ) B ,
or equivalently,
P ˜ N ( ω ) , Γ ( ξ ) + δ ^ B P ˜ , Γ ( ξ ) + δ B K
for N N 1 ( δ ^ , ω ) , where B is the unit closed ball in the space of K .
Notice that P ˜ N P N , and hence, for N N 1 ( δ ^ , ω ) , the Slater condition holds with respect to δ ^ for the system
P N : = { P P ^ N : E P [ Γ ( ξ ) ] K } .
Now we define, for any Q P ^ N , ρ Q = inf K K K Q , Γ ( ξ ) and
Q ¯ = 1 ρ Q ρ Q + δ ^ Q + ρ Q ρ Q + δ ^ P ˜ N .
Obviously, we have Q ¯ P ^ 2 N . Similar to that proof of [19] (Theorem 2), we again obtain Q ¯ P 2 N . Then, we have
ζ p ( Q , P 2 N ) ζ p ( Q , Q ¯ ) = sup g G FM p | Q , g Q ¯ , g | = ρ Q ρ Q + δ ^ sup g G FM p | Q , g P ˜ N ( ω ) , g | ζ p ( Q , P ˜ N ( ω ) ) δ ^ inf K K K Q , Γ ( ξ ) ζ p ( Q , P ˜ ) + ζ p ( P ˜ , P ˜ N ( ω ) ) δ ^ inf K K K Q , Γ ( ξ ) ζ p ( Q , P ˜ ) + 1 δ ^ inf K K K Q , Γ ( ξ )
for N N 2 ( ω ) and ω Ω Ω 2 with P ( Ω 2 ) = 0 , where the last inequality follows from Theorem 2.
Finally, letting Ω 0 : = Ω 1 Ω 2 and N ^ ( δ ^ , ω ) : = max { N 1 ( δ ^ , ω ) , N 2 ( ω ) } completes the proof. □
The following theorem states that the discrete approximation ambiguity set P N converges to P as N in the sense of FM metrics.
Theorem 7.
Suppose that: (i) Assumption 3 holds; (ii) P P p ( Ξ ) ; (iii)
sup P P E P [ Γ ( ξ ) ] < + a n d C P : = sup P , Q P ζ p ( P , Q ) < + .
Then,
lim N ζ p ( P , P N ) = 0
with probability 1.
Proof. 
For any P P , by the triangle inequality, we have
ζ p ( P , P N ) ζ p ( P , P N ) + ζ p ( P N , P N )
where P N is the empirical distribution of P with N samples. Since P N P ^ N P , we know from Proposition 6 that
ζ p ( P N , P N ) ζ p ( P N , P ˜ ) + 1 δ ^ inf K K K P N , Γ ( ξ ) C P + 1 δ ^ P , Γ ( ξ ) P N , Γ ( ξ )
for N N ^ ( δ ^ , ω ) and almost every ω Ω . Thus, we have
ζ p ( P , P N ) ζ p ( P , P N ) + C P + 1 δ ^ P , Γ ( ξ ) P N , Γ ( ξ ) .
Subsequently,
ζ p ( P , P N ) = sup P P ζ p ( P , P N ) sup P P ζ p ( P , P N ) + C P + 1 δ ^ sup P P P , Γ ( ξ ) P N , Γ ( ξ ) .
For the first term on the right-hand side, the definition of supremum, the boundedness of P , and Theorem 2 give rise to
lim N sup P P ζ p ( P , P N ) lim N ζ p ( P k , P N k ) + ϵ k = ϵ k
with probability 1, where { P k } is a sequence included in P such that
sup P P ζ p ( P , P N ) ζ p ( P k , P N k ) + ϵ k
and { ϵ k } is a positive sequence with ϵ k 0 as N . Thus, we obtain
lim N sup P P ζ p ( P , P N ) = 0
with probability 1.
Analogously, by the law of large numbers, we can derive that
lim N sup P P P , Γ ( ξ ) P N , Γ ( ξ ) = 0
with probability 1.
Then, we complete the proof. □
The following corollary shows the reasonability for the approximation of Problem (18) to Problem (17).
Corollary 1.
Under the conditions of Proposition 5 and Theorem 7, we have
| v ( P ) v ( P N ) | 0 , d S ( P N ) , S ( P ) 0 .
with probability 1, as N .
Remark 5.
In this subsection, we investigated the discrete approximation of the DRO problem with the general moment information ambiguity set. Compared with the existing work [19], we have further weakened the necessary assumptions and extended them to a more general case. Firstly, the Lipschitz continuity of the objective function is required in [19] (Theorem 14) due to the adoption of the Wasserstein metric, so that the upper bound between the discrete approximation of the DRO problem and the original DRO problem can be derived [19] (Proposition 7). We only call for the locally Lipschitz continuity. More importantly, they restricted their discussion to the bounded support set case because the upper bound in [19] (Proposition 7) would be infinity when the support set is unbounded, which is not well defined in this case. However, our support set can be unbounded by employing our convergence results in Section 3.

5. Concluding Remarks

In this study, we investigated different kinds of convergence assertions about data-driven FM metrics and their possible applications. In view of the rich results about Wasserstein metrics (Lemmas 2 and 3), we first established the relationship between the FM metric and the Wasserstein metric (Lemma 4). Based on these results, the non-asymptotic moment estimate (Theorem 1), asymptotic convergence estimate (Theorem 2), and non-asymptotic concentration estimate (Theorem 3) for FM metrics were presented. These convergence assertions for FM metrics were applied to the asymptotic analyses of the empirical approximations of four kinds of stochastic optimization problems. The results sufficiently show the motivations of this study and its importance.
There are still some topics to settle in the future. For example, we leave the numerical tractability for the results in Section 4.3 and Section 4.4 for future work.

Author Contributions

Supervision, Z.C.; Writing—original draft, J.J.; Writing—review & editing, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by China Postdoctoral Science Foundation (Grant Number 2020M673117), the National Natural Science Foundation of China (Grant Numbers 11991023 and 11735011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 4. 
According to the definition of FM metric, we have
ζ p ( P , Q ) = sup g G p Ξ g ( ξ ) ( P Q ) ( d ξ ) .
Moreover, it is easy to verify that g ( ξ ) adding any constant will not change the value of the integral. For simplification of the following discussion, without loss of generality, we hereafter set g ( ξ 0 ) = 0 for any fixed ξ 0 arg min ξ Ξ ξ . We denote M R = { ξ Ξ : ξ R } and M R c is the complementary set of M R . Since B ( 0 , R ) Ξ and thus ξ 0 M R , we have an upper bound estimation of g ( ξ ) as follows:
| g ( ξ ) | = | g ( ξ ) g ( ξ 0 ) | max { 1 , ξ , ξ 0 } p 1 ξ ξ 0 max { 1 , ξ , ξ 0 } p 1 ( ξ + ξ 0 ) .
If ξ M R c , then, ξ > R ξ 0 , and we have the following upper bound of | g ( ξ ) | :
| g ( ξ ) | ξ p 1 ( ξ + ξ 0 ) 2 ξ p .
Then, we continue
ζ p ( P , Q ) = sup g G p M R g ( ξ ) ( P Q ) ( d ξ ) + M R c g ( ξ ) ( P Q ) ( d ξ ) sup g G p R p 1 M R h ( ξ ) ( P Q ) ( d ξ ) + 2 M R c ξ p ( P + Q ) ( d ξ ) ,
where h : M R R is defined by h ( · ) : = g ( · ) / R p 1 . It is easy to see that h is Lipschitz continuous on M R with Lipschitz modulus 1. Based on Lemma 1, we can extend h to R s , and its restriction on Ξ is denoted by h ˜ ( · ) . Then, h ˜ ( · ) is Lipschitz continuous on Ξ with Lipschitz modulus 1. Thus, we can continue
ζ p ( P , Q ) sup g G p ( R p 1 Ξ h ˜ ( ξ ) ( P Q ) ( d ξ ) R p 1 M R c h ˜ ( ξ ) ( P Q ) ( d ξ ) + 2 M R c ξ p ( P + Q ) ( d ξ ) ) .
So, for any ξ Ξ , we have
h ˜ ( ξ ) h ˜ ( ξ 0 ) ξ ξ 0 .
Note that h ˜ ( ξ 0 ) = g ( ξ 0 ) / R p 1 = 0 , so
h ˜ ( ξ ) = h ˜ ( ξ ) h ˜ ( ξ 0 ) ξ ξ 0 ξ + ξ 0 .
Similarly, this means that h ˜ ( ξ ) 2 ξ for any ξ M R c . Then, we continue
ζ p ( P , Q ) R p 1 D W ( P , Q ) + 2 R p 1 M R c ξ ( P + Q ) ( d ξ ) + 2 M R c ξ p ( P + Q ) ( d ξ ) R p 1 D W ( P , Q ) + 4 M R c ξ p ( P + Q ) ( d ξ ) .
The proof is complete. □
Proof of Theorem 1. 
According to Lemma 4 with Q = P N , we have
E [ ζ p ( P , P N ) ] E R p 1 D W ( P , P N ) + 4 M R c ξ p ( P + P N ) ( d ξ ) = R p 1 E D W ( P , P N ) + 4 M R c ξ p P ( d ξ ) + 4 E M R c ξ p P N ( d ξ ) .
Moreover, since
E M R c ξ p P N ( d ξ ) = E 1 N i = 1 N 1 M R c ( ξ i ) · ξ i p = E 1 M R c ( ξ ) · ξ p ,
we obtain
E [ ζ p ( P , P N ) ] R p 1 E D W ( P , P N ) + 8 E 1 M R c ( ξ ) · ξ p .
Meanwhile, we know from Lemma 2 that
E [ D W ( P N , P ) ] C E P [ ξ p ] 1 / p α ( N ) ,
where α ( N ) = N 1 / max { 2 , s } log ( 1 + N ) + N ( p 1 ) / p 0 as N . Then, we take R = α ( N ) 1 / p . Since R + as N , we have R 1 and B ( 0 , R ) Ξ for sufficiently large N. Therefore, we have
E [ ζ p ( P , P N ) ] R p 1 E D W ( P , P N ) + 8 E 1 M R c ( ξ ) · ξ p C E P [ ξ p ] 1 / p α ( N ) 1 / p + 8 E 1 M R c ( ξ ) · ξ p
and
E 1 M R c ( ξ ) · ξ p 0 , as N .
Thus, letting
β N = C E P [ ξ p ] 1 / p α ( N ) 1 / p + 8 E 1 M R c ( ξ ) · ξ p
completes the proof. □
Proof of Theorem 2. 
To prove this assertion, we need to verify that: for any ϵ > 0 , there exists a positive number N ( ϵ , ω ) such that
ζ p ( P , P N ) ϵ
as N N ( x , ω ) for almost every ω Ω . Notice from Lemma 4 that
ζ p ( P , P N ) R p 1 D W ( P , P N ) + 4 M R c ξ p ( P + P N ) ( d ξ )
for sufficiently large R.
We can deduce from P P p ( Ξ ) and Lemma 2 that
M R c ξ p P ( d ξ ) 0 as R +
and
M R c ξ p P N ( d ξ ) M R c ξ p P ( d ξ ) as N
with probability 1. Thus, there always exists a sufficiently large positive number R ( ϵ ) such that
M R c ξ p P ( d ξ ) ϵ 32
as R R ( ϵ ) . Moreover, there exists a positive number N 1 : = N 1 ( ϵ , ω ) such that
| M R c ξ p P N ( d ξ ) M R c ξ p P ( d ξ ) | ϵ 16
as N N 1 with probability 1, which implies from the triangle inequality that
M R c ξ p P N ( d ξ ) 3 ϵ 32
with probability 1. Combining (A2) with (A3), we have
4 M R c ξ p ( P + P N ) ( d ξ ) 4 ϵ 32 + 3 ϵ 32 = ϵ 2
as R R ( ϵ ) and N N 1 , with probability 1.
On the other hand, we know from the Glivenko-Cantelli theorem [35] that
D W ( P , P N ) 0 as N with probability 1 ,
which implies that there exists a positive number N 2 : = N 2 ( ϵ , ω ) such that
R p 1 D W ( P , P N ) ϵ 2
when N N 2 .
(A5) and (A4) mean (A1) by letting N = max { N 1 , N 2 } . This completes the proof. □
Proof of Theorem 3. 
We know from Lemma 4 that
ζ p ( P , P N ) R p 1 D W ( P , P N ) + 4 M R c ξ p ( P + P N ) ( d ξ ) .
Then, we have
P ( ζ p ( P , P N ) ϵ ) P R p 1 D W ( P , P N ) + 4 M R c ξ p ( P + P N ) ( d ξ ) ϵ P R p 1 D W ( P , P N ) ϵ / 2 + P 4 M R c ξ p ( P + P N ) ( d ξ ) ϵ / 2 .
For the first term, we know from (3) that
P R p 1 D W ( P , P N ) ϵ / 2 = P D W ( P , P N ) ϵ / ( 2 R p 1 ) α exp β N ϵ 2 R p 1 max { 4 , s } .
We, in what follows, consider the estimation of the second term:
P 4 M R c ξ p ( P + P N ) ( d ξ ) ϵ / 2 .
Since P P p ( Ξ ) , we can choose a sufficiently large R = R ( ϵ ) such that
M R c ξ p P ( d ξ ) ϵ 32 .
Then, we have
P 4 M R c ξ p ( P + P N ) ( d ξ ) ϵ / 2 P M R c ξ p P N ( d ξ ) 3 ϵ 32 = P Ξ 1 M R c ( ξ ) ξ p P N ( d ξ ) 3 ϵ 32 = P 1 N i = 1 N 1 M R c ( ξ i ) ξ i p 3 ϵ 32 = P 1 N i = 1 N 1 M R c ( ξ i ) ξ i p E 1 M R c ( ξ ) ξ p 3 ϵ 32 E 1 M R c ( ξ ) ξ p P 1 N i = 1 N 1 M R c ( ξ i ) ξ i p E 1 M R c ( ξ ) ξ p ϵ 16 .
Furthermore, according to Cramér’s large deviation theorem, we have
P 1 N i = 1 N 1 M R c ( ξ i ) ξ i p E 1 M R c ( ξ ) ξ p ϵ 16 exp N I ϵ 16 ,
where I ( · ) is the so-called (large deviations) rate function defined as
I ϵ 16 : = sup t R ϵ 16 t log M ( t )
and
M ( t ) : = E exp t 1 M R c ( ξ ) ξ p E 1 M R c ( ξ ) ξ p = E exp t 1 M R c ( ξ ) ξ p exp t E 1 M R c ( ξ ) ξ p E exp t ξ p exp t E 1 M R c ( ξ ) ξ p E exp ξ p exp E ξ p < +
for t [ 1 , 1 ] , where the last inequality follows from Assumption 1 with b > p .
We know from [28] (Section 7.2.9) that M ( t ) is positive, convex, and infinitely differentiable at the interior of its domain. This means that log M ( t ) is also convex and infinitely differentiable at the interior of its domain, which is consistent with the domain of M ( t ) . Since M ( t ) is finite on [ 1 , 1 ] , M ( t ) is differentiable on ( 1 , 1 ) . Note that
M ( 0 ) = E 1 M R c ( ξ ) ξ p E 1 M R c ( ξ ) ξ p = 0 .
Then, the derivative of
ϵ 16 t log M ( t ) ,
which is
ϵ 16 M ( t ) M ( t ) ,
is larger than 0 at t = 0 . Due to its differentiability, which implies the continuity, there exists a sufficiently small 0 < t ¯ 1 such that (A7) is larger than 0 for any t [ 0 , t ¯ ] . Then, for any t ( 0 , t ¯ ] , we have
ϵ 16 t log M ( t ) > ϵ 16 · 0 log M ( 0 ) = 0 .
Therefore, we obtain that I ϵ 16 is positive.
Finally, we obtain
P ( ζ p ( P , P N ) ϵ ) α exp β N ϵ 2 R p 1 max { 4 , s } + exp N I ϵ 16 ( 1 + α ) exp min β ϵ 2 R p 1 max { 4 , s } , I ϵ 16 N .
Letting α ^ : = 1 + α and
β ^ : = min β ϵ 2 R p 1 max { 4 , s } , I ϵ 16
completes the proof. □

References

  1. Fournier, N.; Guillin, A. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 2015, 162, 707–738. [Google Scholar] [CrossRef] [Green Version]
  2. Villani, C. Optimal Transport: Old and New; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 338. [Google Scholar]
  3. Rachev, S.T.; Rüschendorf, L. Mass Transportation Problems: Volume I: Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998; Volume 1. [Google Scholar]
  4. Horowitz, J.; Karandikar, R.L. Mean rates of convergence of empirical measures in the Wasserstein metric. J. Comput. Appl. Math. 1994, 55, 261–273. [Google Scholar] [CrossRef] [Green Version]
  5. Weed, J.; Bach, F. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. arXiv 2017, arXiv:1707.00087. [Google Scholar] [CrossRef] [Green Version]
  6. Dereich, S.; Scheutzow, M.; Schottstedt, R. Constructive quantization: Approximation by empirical measures. Ann. L’Ihp Probab. Stat. 2013, 49, 1183–1203. [Google Scholar] [CrossRef]
  7. Bolley, F.; Guillin, A.; Villani, C. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probab. Theory Relat. Fields 2007, 137, 541–593. [Google Scholar] [CrossRef] [Green Version]
  8. Boissard, E. Simple bounds for the convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Probab. 2011, 16, 2296–2333. [Google Scholar] [CrossRef]
  9. Zhao, C.; Guan, Y. Data-driven risk-averse two-stage stochastic program with ζ-structure probability metrics. Optim. Online 2015, 2, 1–40. [Google Scholar]
  10. Römisch, W. Stability of Stochastic Programming Problems. Handb. Oper. Res. Manag. Sci. 2003, 10, 483–554. [Google Scholar]
  11. Rachev, S.T.; Römisch, W. Quantitative stability in stochastic programming: The method of probability metrics. Math. Oper. Res. 2002, 27, 792–818. [Google Scholar] [CrossRef] [Green Version]
  12. Römisch, W.; Vigerske, S. Quantitative stability of fully random mixed-integer two-stage stochastic programs. Optim. Lett. 2008, 2, 377–388. [Google Scholar] [CrossRef] [Green Version]
  13. Han, Y.; Chen, Z. Quantitative stability of full random two-stage stochastic programs with recourse. Optim. Lett. 2015, 9, 1075–1090. [Google Scholar] [CrossRef]
  14. Strugarek, C. On the Fortet-Mourier Metric for The Stability of Stochastic Optimization Problems, An Example; Humboldt-Universität zu Berlin: Berlin, Germany, 2004. [Google Scholar]
  15. Shapiro, A. Monte Carlo sampling methods. Handb. Oper. Res. Manag. Sci. 2003, 10, 353–425. [Google Scholar]
  16. Shapiro, A.; Xu, H. Stochastic mathematical programs with equilibrium constraints, modelling and sample average approximation. Optimization 2008, 57, 395–418. [Google Scholar] [CrossRef]
  17. Dentcheva, D.; Römisch, W. Stability and sensitivity of stochastic dominance constrained optimization models. SIAM J. Optim. 2013, 23, 1672–1688. [Google Scholar] [CrossRef]
  18. Esfahani, P.M.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
  19. Liu, Y.; Pichler, A.; Xu, H. Discrete approximation and quantification in distributionally robust optimization. Math. Oper. Res. 2018, 44, 19–37. [Google Scholar] [CrossRef] [Green Version]
  20. Kantorovich, L.V.; Rubinstein, G.S. On a space of completely additive functions. Vestn. Leningrad. Univ. 1958, 13, 52–59. [Google Scholar]
  21. Valentine, F.A. A Lipschitz condition preserving extension for a vector function. Am. J. Math. 1945, 67, 83–93. [Google Scholar] [CrossRef]
  22. Dentcheva, D.; Henrion, R.; Ruszczyński, A. Stability and sensitivity of optimization problems with first order stochastic dominance constraints. SIAM J. Optim. 2007, 18, 322–337. [Google Scholar] [CrossRef] [Green Version]
  23. Dentcheva, D.; Ruszczyński, A. Robust stochastic dominance and its application to risk-averse optimization. Math. Program. 2010, 123, 85–100. [Google Scholar] [CrossRef]
  24. Chen, Z.; Jiang, J. Stability analysis of optimization problems with kth order stochastic and distributionally robust dominance constraints induced by full random recourse. SIAM J. Optim. 2018, 28, 1396–1419. [Google Scholar] [CrossRef]
  25. Sun, H.; Xu, H. Convergence analysis of stationary points in sample average approximation of stochastic programs with second order stochastic dominance constraints. Math. Program. 2014, 143, 31–59. [Google Scholar] [CrossRef]
  26. Liu, Y.; Xu, H. Stability analysis of stochastic programs with second order dominance constraints. Math. Program. 2013, 142, 435–460. [Google Scholar] [CrossRef]
  27. Dentcheva, D.; Ruszczyński, A. Optimization with stochastic dominance constraints. SIAM J. Optim. 2003, 14, 548–566. [Google Scholar] [CrossRef]
  28. Shapiro, A.; Dentcheva, D.; Ruszczyński, A. Lectures on Stochastic Programming: Modeling and Theory; SIAM: Philadelphia, PA, USA, 2014. [Google Scholar]
  29. Bertsimas, D.; Gupta, V.; Kallus, N. Data-driven robust optimization. Math. Program. 2018, 167, 235–292. [Google Scholar] [CrossRef] [Green Version]
  30. Bertsimas, D.; Gupta, V.; Kallus, N. Robust sample average approximation. Math. Program. 2018, 171, 217–282. [Google Scholar] [CrossRef] [Green Version]
  31. Delage, E.; Ye, Y. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 2010, 58, 595–612. [Google Scholar] [CrossRef] [Green Version]
  32. Pichler, A.; Xu, H. Quantitative stability analysis for minimax distributionally robust risk optimization. Math. Program. 2022, 191, 47–77. [Google Scholar] [CrossRef] [Green Version]
  33. Zhang, J.; Xu, H.; Zhang, L. Quantitative stability analysis for distributionally robust optimization with moment constraints. SIAM J. Optim. 2016, 26, 1855–1882. [Google Scholar] [CrossRef]
  34. Jiang, J.; Chen, Z. Quantitative stability analysis of two-stage stochastic linear programs with full random recourse. Numer. Funct. Anal. Optim. 2019, 40, 1847–1876. [Google Scholar] [CrossRef]
  35. Varadarajan, V.S. On the convergence of sample probability distributions. Sankhyā Indian J. Stat. 1958, 19, 23–26. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, Z.; Hu, H.; Jiang, J. Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization. Sustainability 2022, 14, 4501. https://doi.org/10.3390/su14084501

AMA Style

Chen Z, Hu H, Jiang J. Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization. Sustainability. 2022; 14(8):4501. https://doi.org/10.3390/su14084501

Chicago/Turabian Style

Chen, Zhiping, He Hu, and Jie Jiang. 2022. "Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization" Sustainability 14, no. 8: 4501. https://doi.org/10.3390/su14084501

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop