Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization

Chen, Zhiping; Hu, He; Jiang, Jie

doi:10.3390/su14084501

Open AccessArticle

Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization

by

Zhiping Chen

,

He Hu

and

Jie Jiang

^*

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(8), 4501; https://doi.org/10.3390/su14084501

Submission received: 22 March 2022 / Revised: 2 April 2022 / Accepted: 6 April 2022 / Published: 10 April 2022

Download Versions Notes

Abstract

:

Fortet-Mourier (FM) probability metrics are important probability metrics, which have been widely adopted in the quantitative stability analysis of stochastic programming problems. In this study, we contribute to different types of convergence assertions between a probability distribution and its empirical distribution when the deviation is measured by FM metrics and consider their applications in stochastic optimization. We first establish the quantitative relation between FM metrics and Wasserstein metrics. After that, we derive the non-asymptotic moment estimate, asymptotic convergence, and non-asymptotic concentration estimate for FM metrics, which supplement the existing results. Finally, we apply the derived results to four kinds of stochastic optimization problems, which either extend the present results to more general cases or provide alternative avenues. All these discussions demonstrate the motivation as well as the significance of our study.

Keywords:

Fortet-Mourier metric; discrete approximation; stochastic optimization; stochastic dominance; distributionally robust

MSC:

90C15; 91B70

1. Introduction

The estimation of the distance between a distribution and its empirical approximation obtained from some independent and identically distributed (iid) samples is an important subject in probability theory, mathematical statistics, and information theory. It has vast applications in many fields, such as quantization, optimal matching, density estimation, clustering, and so on (see [1] and the references therein for more details). To quantify the distance between two probability distributions, some rules have been adopted to generate probability metrics, such as the commonly used

ζ

-structure metric. By selecting different generators of the

ζ

-structure metric, we obtain a number of well-known probability metrics, such as the Wasserstein metric (in this study, when we refer to Wasserstein metric, it means the 1-Wasserstein metric, which is also called the Kantorovich–Rubinstein metric or Kantorovich metric), FM metric, and total variation metric.

Among probability metrics with

ζ

-structure, the Wasserstein metric is the most popular one which has been widely applied in statistics, probability, and machine learning [2]. It originates from the optimal transportation problem and thus can be interpreted as an optimal mass transportation plan. Except for its practical meaning in transportation, the Wasserstein metric has some good properties. For example, convergence in the Wasserstein metric is equivalent to weak convergence plus the convergence of the first order absolute moment [2].

There is some literature concentrated on the convergence analysis under Wasserstein metrics between a distribution and its empirical approximation. From now on, we refer to this as the data-driven Wasserstein metric for simplicity, and other probability metrics do the same. These convergence analyses can be mainly divided into two parts: moment estimates which aim at providing the rate of convergence for the expectation of the Wasserstein distance between a distribution and its empirical approximation and concentration estimates which focus on the violation probability under a given tolerance. As for moment estimates, some earlier results can be found in [3,4], which provide a relatively loose convergence rate. More recently, Weed and Bach [5] focused on the compact support set case and obtained a sharp convergence rate. Dereich et al. [6] conducted the almost optimal convergence analysis. However, they put some restrictions on the range of parameters. An interesting result was given in [1], which extends some results in [6] from a limited range of parameters to the general case. As for concentration estimates, only a few results are available. The corresponding results can be found in [7,8] under some strong assumptions. Moreover, it requires that the violation parameter is large enough. In [9], Zhao and Guan investigated the case with a discrete and bounded support set. Particularly, an elaborate result on the rate of convergence of data-driven Wasserstein distance was presented in [1].

As pointed out in [2] (p. 110), the Wasserstein metric is a rather strong probability metric. Intuitively, it needs harsh conditions to establish the strong Wasserstein type upper bound estimate. Actually, we know from the definition of the Wasserstein metric that its generator is the set of Lipschitz continuous functions with Lipschitz modulus being one.

Compared with the Wasserstein metric, FM metrics are more general; their generator is a class of locally Lipschitz continuous functions. Therefore, it is more friendly to obtain some upper bounds by adopting FM metrics. In view of this, FM metrics have been widely used in the quantitative stability analysis of stochastic programming problems when the underlying probability distribution is perturbed and approximated, see for example [10,11,12,13]. Moreover, FM metrics have a close relationship with Wasserstein metrics through dual representation (Kantorovich–Rubinstein theorem). Generally, the pth order FM metric degrades into the Wasserstein metric when

p = 1

. From this point of view, the FM metric can be viewed as an extension of the Wasserstein metric. Nevertheless, there are few results concerning the convergence analysis for data-driven FM metrics. To the best of our knowledge, only Strugarek [14] examined the asymptotic convergence analysis under the FM distance.

In view of the above situations, in this article we study the data-driven FM metric. The main contributions of this study can be summarized as follows:

We establish the quantitative connection between the Wasserstein metric and the FM metric. Based on this connection, we investigate the non-asymptotic moment estimate, asymptotic convergence, and non-asymptotic concentration estimate for data-driven FM metrics.
We provide an alternative avenue for the convergence analysis of discrete approximations for two-stage stochastic programming problems. Different from the convergence or exponential rate of convergence analysis in [15,16], where some complex conditions are required, our approach is straightforward and brief.
We reestablish the quantitative stability results for stochastic optimization problems with stochastic dominance constraints through FM metrics. Compared with that in [17], our conditions are weaker and different probability metrics are adopted. More importantly, we can apply the convergence conclusion to examine the discrete approximation method which is crucial for numerical solution.
We consider data-driven distributionally robust optimization (DRO) problems with FM ball, which extends the results in [18] from the ambiguity set constructed by Wasserstein ball to the FM ball case. We prove the finite sample guarantee and asymptotic consistency, which lay the theoretical foundation for the data-driven approach for the DRO model.
We analyze the discrete approximation of the DRO problem whose ambiguity set is constructed with the general moment information. Compared with the existing work [19] under the bounded support set, we weaken their conditions and extend their results to the case with an unbounded support set.

The remainder of this study is organized as follows. In Section 2, we give some prerequisites for further discussion. In Section 3, we discuss different kinds of convergence results for data-driven FM metrics. We consider four applications to verify our convergence results and to further demonstrate the motivation and significance of this study in Section 4. Finally, we have some concluding remarks in Section 5.

2. Prerequisites

Let

ξ : Ω \to Ξ \subseteq R^{s}

be a random vector defined on the probability space

(Ω, F, P)

. Then, its induced probability distribution (sometimes it is called probability measure) on

Ξ

is

P : = P ◦ ξ^{- 1}

. We use

P (Ξ)

to denote all the probability distributions on

Ξ

. The set of probability distributions having finite pth order absolute moments is denoted by

P_{p} (Ξ) : = {P \in P (Ξ) : \int_{Ξ} {∥ξ∥}^{p} P (d ξ) < + \infty}

.

Probability metrics measure the distance between two probability distributions. Generally, they do not satisfy the three axioms of usual distance in metric space. A commonly used class of probability metrics is the probability metric with

ζ

-structure, whose definition is as follows.

Definition 1.

Let

G

be a set of measurable functions from Ξ to

R

. Then, for any

P, Q \in P (Ξ)

,

D_{G} (P, Q) : = sup_{g \in G} | E_{P} [g (ξ)] - E_{Q} [g (ξ)] |

is called the ζ-structure probability metric induced by

G

.

The

G

in Definition 1 totally determines the resulting

ζ

-structure probability metric, so it is called the generator of the

ζ

-structure probability metric. FM metrics and Wasserstein metrics can be deduced from the

ζ

-structure probability metric by choosing specific generators. Particularly, we have the following definitions.

Definition 2.

Let

P, Q \in P_{p} (Ξ)

for some

p \geq 1

and

G_{{FM}_{p}}

denote a set of locally Lipschitz continuous functions given by

\{g : Ξ \to R : | g (ξ_{1}) - g (ξ_{2}) | \leq max {1, ∥ξ_{1}∥, ∥ξ_{2}∥}^{p - 1} ∥ξ_{1} - ξ_{2}∥, \forall ξ_{1}, ξ_{2} \in Ξ\} .

Then, the pth order FM metric between P and Q is

ζ_{p} (P, Q) : = sup_{g \in G_{{FM}_{p}}} | E_{P} [g (ξ)] - E_{Q} [g (ξ)] | .

Definition 3.

Let

P, Q \in P_{1} (Ξ)

and

G_{W} : = \{g : Ξ \to R : | g (ξ_{1}) - g (ξ_{2}) | \leq ∥ξ_{1} - ξ_{2}∥, \forall ξ_{1}, ξ_{2} \in Ξ\} .

Then, the Wasserstein metric between P and Q is

D_{W} (P, Q) : = sup_{g \in G_{W}} | E_{P} [g (ξ)] - E_{Q} [g (ξ)] | .

It is easy to see from the above definitions that

ζ_{1} (P, Q) = D_{W} (P, Q)

for any

P, Q \in P (Ξ)

. Moreover, we have that: if

g \in G_{{FM}_{p}}

, then,

- g \in G_{{FM}_{p}}

, so does

G_{W}

. Therefore, we can ignore the absolute value operator in Definitions 2 and 3 when we take supremum. Moreover, both FM metrics and Wasserstein metrics have a close relationship with weak convergence. One can refer to [10] (p. 490) and [2] (Theorem 6.9) for more details.

The Wasserstein metric has an alternative definition which corresponds to the coupling marginal distributions. Specifically, the Wasserstein metric between P and Q is defined as (see [2], Definition 6.1):

D_{W} (P, Q) = inf \{\int_{Ξ \times Ξ} ∥ξ_{1} - ξ_{2}∥ π (d ξ_{1}, d ξ_{2}) : π \in Π\},

(1)

where

Π

is the collection of all joint distributions of

ξ_{1}

and

ξ_{2}

with marginal distributions P and Q, respectively. It is known from Kantorovich–Rubinstein theorem [20] that Definition 3 is the dual representation of (1).

We have the following extension theorem for Lipschitz functions in Hilbert space (see [21], Theorems 4 and 5).

Lemma 1.

Let X and Y be Hilbert spaces and

g : B \subseteq X \to Y

be a Lipschitz function with Lipschitz modulus

L_{g}

. Then, there exists a Lipschitz function

\hat{g} : X \to Y

such that

\hat{g} (x) = g (x)

for any

x \in B

and

L_{g}

is also the Lipschitz modulus of

\hat{g}

.

Lemma 1 is important for the following discussion. In [1], the authors assumed that the support set

Ξ

is the whole space

R^{s}

. They obtained the non-asymptotic moment estimate [1] (Theorem 1) and concentration estimate [1] (Theorem 2) for the Wasserstein metric. For any

P, Q \in P (Ξ)

, we can view them as probability distributions

\tilde{P}, \tilde{Q} \in P (R^{s})

through the following correspondence:

\tilde{P} (A) : = P (A \cap Ξ) and \tilde{Q} (A) : = P (Q \cap Ξ)

for all

A \subseteq R^{s}

. That is, we set the probability of the area

R^{s} ∖ Ξ

to be zero. Generally, we have

D_{W} (P, Q) = D_{W} (\tilde{P}, \tilde{Q})

. The details are as follows:

\begin{matrix} D_{W} (P, Q) & = sup_{g \in G_{W} (Ξ)} | \int_{Ξ} g (ξ) (P - Q) (d ξ) | \\ = sup_{\hat{g} \in {\hat{G}}_{W} (R^{s})} | \int_{R^{s}} \hat{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) |, \end{matrix}

where

G_{W} (Ξ)

denotes the collection of all the Lipschitz continuous functions with Lipschitz modulus 1 on

Ξ

, and

{\hat{G}}_{W} (R^{s})

is the extension of

G_{W} (Ξ)

according to Lemma 1. Obviously,

{\hat{G}}_{W} (R^{s}) \subseteq G_{W} (R^{s})

which is the set of Lipschitz continuous functions with Lipschitz modulus 1 over

R^{s}

. Thus, we have the estimation

\begin{matrix} sup_{\hat{g} \in {\hat{G}}_{W} (R^{s})} | \int_{R^{s}} \hat{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) | & \leq sup_{\bar{g} \in G_{W} (R^{s})} | \int_{R^{s}} \bar{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) | \\ = D_{W} (\tilde{P}, \tilde{Q}) . \end{matrix}

That is,

D_{W} (P, Q) \leq D_{W} (\tilde{P}, \tilde{Q})

.

On the other hand, for any

\bar{g} \in G_{W} (R^{s})

, its restriction on

Ξ

is Lipschitz continuous with Lipschitz modulus 1. Thus,

\begin{matrix} sup_{\bar{g} \in G_{W} (R^{s})} | \int_{R^{s}} \bar{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) | & = sup_{\bar{g} \in G_{W} (R^{s})} | \int_{Ξ} \bar{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) + \int_{Ξ^{c}} \bar{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) | \\ = sup_{\bar{g} \in G_{W} (R^{s})} | \int_{Ξ} \bar{g} (ξ) (\tilde{P} - \tilde{Q}) (d ξ) | \\ \leq D_{W} (P, Q) . \end{matrix}

Finally, we have

D_{W} (P, Q) = D_{W} (\tilde{P}, \tilde{Q})

. Therefore, although all the convergence results in [1] were derived under

R^{s}

, we can extend them to any support set

Ξ \subseteq R^{s}

.

Lemma 2

([1], Theorem 1). Let

P \in P_{p} (Ξ)

for some

p > 1

. Then, there exists a constant C depending only on s (the dimension of Ξ) and p such that, for all

N \geq 1

,

\begin{matrix} E [D_{W} (P_{N}, P)] \leq C {(E_{P} [{∥ξ∥}^{p}])}^{1 / p} \\ \times \{\begin{matrix} N^{- 1 / 2} + N^{- (p - 1) / p} & i f s = 1 a n d p \neq 2, \\ N^{- 1 / 2} log (1 + N) + N^{- (p - 1) / p} & i f s = 2 a n d p \neq 2, \\ N^{- 1 / s} + N^{- (p - 1) / p} & i f s > 2 a n d p \neq s / (s - 1), \end{matrix} \end{matrix}

where log is the natural logarithm.

Lemma 2 cannot cover all the pairs

(s, p)

, for example,

(s, p) = (1, 2)

or

(s, p) = (2, 2)

. However, we can always reset p such that Lemma 2 holds by the following procedures. If

s = 1

or 2 and

p = 2

, P must belong to

P_{q} (Ξ)

for any

q \in (1, 2)

. If

s > 2

and

p = 2

, we can select

q \in (1, 2)

such that

q \neq s / (s - 1)

. If

s > 2

and

p = s / (s - 1)

, we can choose any

q \in (1, s / (s - 1))

. Then, we let

p = q

and have that Lemma 2 holds with

s = 1

or 2 and

p \in (1, 2)

or

s > 2

and

p \neq s / (s - 1)

. Therefore, Lemma 2 is applicable for any

s \in N

through carefully prepared p. In the following discussion, without loss of generality, we always assume that Lemma 2 holds for any pair

(s, p) \in N \times [1, + \infty)

. Further, we can, according to Lemma 2, obtain the following uniform upper bound:

\begin{matrix} E [D_{W} (P_{N}, P)] \leq C {(E_{P} [{∥ξ∥}^{p}])}^{1 / p} (N^{- 1 / max {2, s}} log (1 + N) + N^{- (p - 1) / p}) \end{matrix}

for any

s \in N

and

N \geq 1

.

Assumption 1.

Let

P \in P (Ξ)

satisfy

A : = E_{P} [exp ({∥ ξ ∥}^{b})] = \int_{Ξ} exp ({∥ ξ ∥}^{b}) P (d ξ) < \infty

(2)

for some constant b.

Lemma 3.

Suppose that Assumption 1 holds for some

b > 1

. Then, we have for

ϵ \in (0, 1]

that

\begin{matrix} P \{D_{W} (P, P_{N}) \geq ϵ\} \leq α \times \{\begin{matrix} exp (- β N ϵ^{2}) & i f s = 1, \\ exp (- β N {(ϵ / log (2 + 1 / ϵ))}^{2}) & i f s = 2, \\ exp (- β N ϵ^{s}) & i f s > 2 \end{matrix} \end{matrix}

for all

N \geq 1

, where α and β are two positive constants depending only on P, b, and s.

Proof.

Based on Assumption 1, we know that Condition (1) in [1] holds. Then, due to

ϵ \in (0, 1]

, Lemma 3 directly follows from [1] (Theorem 2). □

For a more comprehensive version of Lemma 3, one can refer to [1] (Theorem 2). Here, we focus on the case

ϵ \in (0, 1]

because it is more interesting for us to investigate a smaller violation rather than a bigger one. A simplified version can also be found in [18] (Theorem 3.4) where the assumption

s \neq 2

is imposed.

To simplify the following discussion, we derive a uniform upper bound for the right-hand side in Lemma 3. Note the fact that

1 + δ \leq e^{δ}

for any

δ \in R

. We have

\begin{matrix} log (2 + \frac{1}{ϵ}) & = 1 + log (2 + \frac{1}{ϵ}) - log (e) \\ = 1 + log (\frac{2}{e} + \frac{1}{e ϵ}) \leq \frac{2}{e} + \frac{1}{e ϵ} . \end{matrix}

Letting

ϵ \in (0, 1 / 2]

gives us that

\begin{matrix} {(\frac{ϵ}{log (2 + 1 / ϵ)})}^{2} \geq \frac{e^{2} ϵ^{4}}{4} . \end{matrix}

When

s = 2

, we have

P \{D_{W} (P, P_{N}) \geq ϵ\} \leq α exp (- β N {(ϵ / log (2 + 1 / ϵ))}^{2}) \leq α exp (- \frac{e^{2} β N ϵ^{4}}{4}) .

Moreover, for

ϵ \in (0, 1 / 2]

,

exp (- β N ϵ^{4}) \geq exp (- \frac{e^{2} β N ϵ^{4}}{4}) \geq exp (- β N ϵ^{2}) .

Therefore, we can obtain a loose but uniform upper bound estimation

\begin{matrix} P \{D_{W} (P, P_{N}) \geq ϵ\} \leq α exp (- β N ϵ^{max {4, s}}) \end{matrix}

(3)

for any

ϵ \in (0, 1 / 2]

and

s \in N

.

3. Convergence Analyses of Data-Driven FM Metrics

In this section, we will investigate different kinds of convergence for data-driven FM metrics. To this end, let

ξ^{1}, ξ^{2}, \dots, ξ^{N}

be N iid samples generated according to P. These samples are viewed here as the random sample

ξ^{i} : Ω \to Ξ

,

1 \leq i \leq N

, on the probability space

(Ω, F, P)

. Then, we obtain the empirical distribution

P_{N}

defined as

P_{N} = \frac{1}{N} \sum_{i = 1}^{N} 1_{ξ^{i}},

where

1_{ξ^{i}} (\cdot)

is the indicator function, that is,

1_{ξ^{i}} (ξ) = 1

for

ξ = ξ^{i}

and

1_{ξ^{i}} (ξ) = 0

otherwise.

We first give the following vital lemma.

Lemma 4.

Let

P, Q \in P_{p} (Ξ)

for some

p \geq 1

. Then,

\begin{matrix} ζ_{p} (P, Q) \leq R^{p - 1} D_{W} (P, Q) + 4 \int_{{ξ \in Ξ : ∥ξ∥ > R}} {∥ξ∥}^{p} (P + Q) (d ξ) \end{matrix}

for any R satisfying

R \geq 1

and

B (0, R) \cap Ξ \neq \emptyset

. Here

0

is the original point in

R^{s}

and

B (0, R)

is the closed ball centered at

0

with radius R.

The proof of Lemma 4 can be found in Appendix A.

If we define

ϕ_{P} (P, Q) : = inf_{R} \{R^{p - 1} D_{W} (P, Q) + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + Q) (d ξ) : R \geq 1, B (0, R) \cap Ξ \neq \emptyset\},

then, we can obtain a tighter upper bound estimation of

ζ_{p} (P, Q)

, that is

ζ_{p} (P, Q) \leq ϕ_{P} (P, Q) .

The first convergence result is about the non-asymptotic moment estimate. It provides an upper bound for the expectation of the FM distance between P and its empirical approximation distribution.

Theorem 1

(Non-asymptotic moment estimates for FM metrics). Suppose that

P \in P_{p} (Ξ)

for some

p > 1

. Then, for sufficiently large N, we have

E [ζ_{p} (P, P_{N})] \leq β_{N},

where

{β_{N}}

is a sequence of positive numbers satisfying

β_{N} \to 0

as

N \to \infty

.

The proof of Theorem 1 can be found in Appendix A.

Theorem 1 establishes the convergence in the sense of expectation. However, it fails to tell us the sample-wise convergence. The following theorem states the asymptotic convergence under FM metrics for almost every sample.

Theorem 2

(Asymptotic convergence of FM metrics). Suppose that

P \in P_{p} (Ξ)

. Then,

ζ_{p} (P, P_{N}) \to 0

with probability 1, as

N \to \infty

. Here

P_{N}

is defined at the beginning of this section.

The proof of Theorem 2 can be found in Appendix A.

Theorems 1 and 2 claim the convergence. As we know, the rate of convergence is quite important for guiding the solution process in practice. The following theorem gives the estimate of the convergence rate under certain assumptions.

Theorem 3

(Non-asymptotic concentration estimates for FM metrics). Suppose that

P \in P_{p} (Ξ)

(p \geq 1)

and Assumption 1 holds with

b > p

. Then, for any

ϵ \in (0, 1 / 2]

, we have

P (ζ_{p} (P, P_{N}) \geq ϵ) \leq \hat{α} exp (- \hat{β} N)

for some constants

\hat{α} > 0

depending on P, b, and s, and

\hat{β} > 0

depending on P, b, s, and ϵ.

The proof of Theorem 3 can be found in Appendix A.

Remark 1.

Here we assume that

ϵ \in (0, 1 / 2]

. The main reason is that we want to give a relatively simple proof. Fortunately, it is more interesting for us to consider a small violation rather than a large one.

Under certain assumptions, we can obtain an estimation for

I (\frac{ϵ}{16})

. For example, if

M (t) \leq exp (σ^{2} t^{2} / 2)

for

t \in R

, here σ is a positive constant, we have

log M (t) \leq σ^{2} t^{2} / 2

. Then, according to the properties of convex quadratic functions, the rate function has the lower bound

I (\frac{ϵ}{16}) \geq \frac{ϵ^{2}}{512 σ^{2}} .

Thus, we can further obtain a concrete estimate for

\hat{β}

. For more details in this aspect, one can refer to [16].

4. Applications

In this section, we consider four applications of convergence conclusions about FM metrics obtained in Section 3. Specifically, we study the discrete approximation of two-stage stochastic programming problems, stochastic optimization problems with dominance constraints, data-driven distributionally robust optimization problems with FM ball, and the discrete approximation for distributionally robust optimization problems with general moment ambiguity set. They will not only further illustrate the motivations of this study but also provide alternative avenues or extensions for the current results.

4.1. Two-Stage Stochastic Linear Programming Problems

Discrete approximation is an important issue in stochastic optimization, which is crucial for its numerical solution. In this subsection, by employing the convergence results in Section 3, we give an alternative avenue for analyzing the discrete approximation of two-stage stochastic programming problems.

Consider the two-stage stochastic programming problem:

min_{x \in X} c^{⊤} x + E_{P} [Φ (x, ξ)],

(4)

where

c \in R^{n}

;

X \subseteq R^{n}

is a polyhedron; the probability measure P is supported on

Ξ \subseteq R^{s}

, which is a polyhedron; and

Φ (x, ξ) : = inf \{q {(ξ)}^{⊤} y (ξ) : W y (ξ) + T (ξ) x = h (ξ), y (ξ) \geq 0\} .

(5)

Here

W \in R^{r \times m}

,

T (ξ) \in R^{r \times n}

,

q (ξ) \in R^{m}

,

h (ξ) \in R^{r}

.

q (ξ), T (ξ)

and

h (ξ)

depend affine linearly on

ξ

.

Denote

f (x, ξ) = c^{⊤} x + Φ (x, ξ)

, and let

v (P)

and

S (P)

denote the optimal value and optimal solution set of Problem (4). Moreover, we use

Pos W

to denote the set

{W \hat{y} : \hat{y} \in R_{+}^{m}}

. Denote

D = {u \in R^{m} : {z \in R^{r} : W^{⊤} z \leq u} \neq \emptyset}

.

To quantify the upper semicontinuity or the deviation distance of the optimal solution set, we define the growth function

ψ_{P} : R_{+} \to R_{+}

as

ψ_{P} (τ) : = min \{E_{P} [f (x, ξ)] - v (P) : d (x, S (P)) \geq τ, x \in X\} .

Its inverse function

ψ_{P}^{- 1}

is given by

ψ_{P}^{- 1} (t) : = sup {τ \in R_{+} : ψ_{P} (τ) \leq t} .

Thus, we can define the associated conditioning function

Ψ_{P} : R_{+} \to R_{+}

as

Ψ_{P} (η) = η + ψ_{P}^{- 1} (2 η) .

It is easy to verify that

ψ_{P}

is nondecreasing and

Ψ_{P}

is increasing. Both

ψ_{P}

and

Ψ_{P}

are lower semicontinuous on

R_{+}

and vanish at 0. One can refer to [10] for more details.

Moreover, we have

ψ_{P}^{- 1} (t) \to 0_{+}

as

t \to 0_{+}

. We illustrate this fact by contradiction. Suppose that there exists a sequence

{t_{n}}

satisfying

t_{n} \to 0_{+}

as

n \to \infty

, such that

ψ_{P}^{- 1} (t_{n}) ↛ 0_{+}

. Denote

τ_{n} = ψ_{P}^{- 1} (t_{n})

. The lower semicontinuity of

ψ_{P}

means that

{τ \in R_{+} : ψ_{P} (τ) \leq t_{n}}

is closed. Thus,

ψ_{P} (τ_{n}) \leq t_{n}

. Due to the nondecreasing property of

ψ_{P}

and

t_{n} \to 0_{+}

as

n \to \infty

,

{τ_{n}}

must be bounded. Without loss of generality, we assume that

τ_{n} \to τ^{*}

as

n \to \infty

, where

τ^{*}

is a positive constant. According to the lower semicontinuity of

ψ_{P}

, we have

0 = \underset{n \to \infty}{lim inf} t_{n} \geq \underset{n \to \infty}{lim inf} ψ_{P} (τ_{n}) \geq ψ_{P} (τ^{*}) > 0,

which leads to a contradiction.

According to the definition of

Ψ_{P}

, we can immediately deduce that

Ψ_{P} (η) \to 0_{+}

as

η \to 0_{+}

.

To introduce the following discussion, we make some standard assumptions (see [11]).

Assumption 2.

Let the following assertions hold:

(1): For each pair $(x, ξ) \in X \times Ξ$ , $h (ξ) - T (ξ) x \in Pos W$ and $q (ξ) \in D$ ;
(2): $P \in P_{2} (Ξ)$ .

Under the above assumptions, we have the following quantitative stability results about the optimal value and optimal solution set of Problem (4).

Lemma 5

([11], Theorem 3.3). Suppose that Assumption 2 holds and

S (P)

is nonempty and bounded. Then, there exist constants

L > 0

and

δ > 0

such that

\begin{matrix} | v (P) - v (Q) | & \leq L ζ_{2} (P, Q), \\ \emptyset \neq S (Q) & \subseteq S (P) + Ψ_{P} (L ζ_{2} (P, Q)) B \end{matrix}

when

Q \in P_{2} (Ξ)

and

ζ_{2} (P, Q) < δ

, where

B

is the closed unit ball in

R^{n}

.

Based on Lemma 5 and the convergence results in Section 3, we have the following convergence conclusions between the two-stage stochastic programming problem (4) and its empirical approximation.

Theorem 4.

Suppose that: (i) Assumption 2 holds; (ii)

S (P)

is nonempty and bounded. Then,

\begin{matrix} | v (P) - v (P_{N}) | & \to 0, \\ d (S (P_{N}), S (P)) & \to 0 \end{matrix}

with probability 1, as

N \to \infty

.

Proof.

For the first assertion, we have from Theorem 2 that

ζ_{2} (P, P_{N}) \to 0

with probability 1. This means that: for the

δ

defined in Lemma 5, there exists a positive number

N_{0} = N_{0} (δ, ω)

such that for any

N \geq N_{0}

,

ζ_{2} (P, P_{N}) \leq δ

for almost every

ω \in Ω

. Then, by Lemma 5, we have that

\begin{matrix} | v (P) - v (P_{N}) | & \leq L ζ_{2} (P, P_{N}), \\ d (S (P_{N}), S (P)) & \leq Ψ_{P} (L ζ_{2} (P, P_{N})) \end{matrix}

hold almost surely as

N \geq N_{0}

, here L is defined in Lemma 5. According to Theorem 2 and the property of

Ψ_{P}

, we have

\begin{matrix} ζ_{2} (P, P_{N}) \to 0 \end{matrix}

and thus

\begin{matrix} Ψ_{P} (L ζ_{2} (P, P_{N})) \to 0 \end{matrix}

with probability 1, as

N \to \infty

. These facts imply that

\begin{matrix} | v (P) - v (P_{N}) | & \to 0, \\ d (S (P_{N}), S (P)) & \to 0 \end{matrix}

with probability 1, as

N \to \infty

. □

Theorem 5.

Suppose that: (i) Assumption 1 holds with

b > 2

; (ii) Assumption 2 holds; (iii)

S (P)

is nonempty and bounded. Then, for any

ϵ \in (0, 1 / 2]

, there exist

\bar{α} > 0

depending on P and s, and

\bar{β} > 0

depending on P, s and ϵ, such that

\begin{matrix} P (| v (P) - v (P_{N}) | \geq L ϵ) & \leq \bar{α} exp (- \bar{β} N), \\ P (d (S (P_{N}), S (P)) \geq Ψ_{P} (L ϵ)) & \leq \bar{α} exp (- \bar{β} N) . \end{matrix}

Proof.

If

| v (P) - v (P_{N}) | \leq L ζ_{2} (P, P_{N})

for L defined in Lemma 5, we have from Theorem 3 that

\begin{matrix} P (| v (P) - v (P_{N}) | \geq L ϵ) & \leq P (ζ_{2} (P, P_{N}) \geq ϵ) \leq \bar{α} exp (- \bar{β} (ϵ) N) \end{matrix}

for any

ϵ \in (0, 1 / 2]

, where

\bar{α} > 0

depends on P and s, and

\bar{β} > 0

depends on P, s and

ϵ

. Here we use

\bar{β} (ϵ)

to stress its dependence on

ϵ

.

As shown in Theorem 4, a sufficient condition for

| v (P) - v (P_{N}) | \leq L ζ_{2} (P, P_{N})

is

ζ_{2} (P, P_{N}) < δ

, where

δ

is defined in Lemma 5. Without loss of generality, we assume that

δ \in (0, 1 / 2]

. Analogously, we have that

\begin{matrix} P (ζ_{2} (P, P_{N}) < δ) & \geq 1 - \bar{α} exp (- \bar{β} (δ) N) . \end{matrix}

Then, we obtain

\begin{matrix} P (| v (P) - v (P_{N}) | \geq L ϵ) & \leq \bar{α} exp (- \bar{β} (ϵ) N) \cdot (1 - \bar{α} exp (- \bar{β} (δ) N)) \\ \leq \bar{α} exp (- \bar{β} (ϵ) N) . \end{matrix}

Similarly, we have

\begin{matrix} P (d (S (P_{N}), S (P)) \geq Ψ_{P} (L ϵ)) & \leq P (Ψ_{P} (L ζ_{2} (P, P_{N})) \geq Ψ_{P} (L ϵ)) \\ = P (ζ_{2} (P, P_{N}) \geq ϵ), \end{matrix}

where the equality follows from the strictly increasing property of

Ψ_{P} (\cdot)

. By the same procedure, we can derive the second assertion. □

Remark 2.

The convergence analysis about two-stage stochastic programming problems can also be found in [11] (Section 4), where the covering and bracketing numbers are introduced. However, it seems difficult to verify the growth rate of the covering or bracketing number in the general case (see [11], Proposition 4.2). Our convergence results are more straightforward. Compared with [11] (Proposition 4.2), instead of the growth rate of the covering or bracketing number, we use the light-tailed distribution assumption. This assumption is commonly used in the literature, see for example [1,18].

4.2. Stochastic Optimization Problems with Stochastic Dominance Constraints

In this part, we consider stochastic optimization problems with stochastic dominance constraints. Stochastic dominance is an important ingredient in economics, decision theory, statistics, and nowadays in modern optimization. It has been widely studied in the last two decades, see for example [17,22,23,24,25,26] and their references therein. Different from classical stochastic optimization models which cope with random variables by taking expectation, stochastic dominance can better reflect the relationship between two random variables. It is known that expected utility theory can also provide the comparison of two random variables. However, it is hardly possible for us to explicitly express the utility functions of decision makers [27]. From this point of view, stochastic dominance is more friendly in practice. Actually, stochastic dominance has a close relationship with expected utility theory. Generally, a random variable

X

dominates another random variable

Y

in the kth

(k \geq 1)

order, denoted by

X ⪰_{(k)} Y

, if

E [u (X)] \geq E [u (Y)]

for every nondecreasing function

u (\cdot)

from a certain set of utility functions [17]. Specially,

X ⪰_{(1)} Y

if and only if

E [u (X)] \geq E [u (Y)]

for every nondecreasing utility function

u (\cdot)

.

X ⪰_{(2)} Y

if and only if

E [u (X)] \geq E [u (Y)]

for every nondecreasing and concave utility function

u (\cdot)

[27].

The convex stochastic optimization model with the kth order stochastic dominance constraint can be described as (see [22,27]):

min {f (x) : x \in D, G (x, ξ) ⪰_{(k)} Y},

(6)

where D is a nonempty closed and convex subset of

R^{n}

;

f : R^{n} \to R

is a convex function; Y is a random variable supported on

Y \subseteq R

, which can be treated as the random benchmark; and

G : R^{n} \times Ξ \to R

. Moreover, we assume that G is locally Lipschitz continuous with respect to

ξ

in the following sense:

\begin{matrix} | G (x, ξ^{'}) - G (x, ξ) | \leq L_{G} max {\{1, ∥ξ^{'}∥, ∥ξ∥\}}^{p - 1} ∥ξ^{'} - ξ∥ \end{matrix}

(7)

for any

ξ^{'}, ξ \in Ξ

, where

p \geq 1

and

L_{G} > 0

. G satisfies the linear growth condition:

\begin{matrix} | G (x, ξ) | \leq C_{G} (B) max {1, ∥ξ∥} \end{matrix}

(8)

for every

x \in B

and

ξ \in Ξ

, where B is any bounded subset of

R^{n}

, and

C_{G} (B) > 0

depends on B.

Actually, we can impose a more general growth condition on G, for example,

\begin{matrix} | G (x, ξ) | \leq C_{G} (B) max {1, ∥ξ∥}^{q} \end{matrix}

for

q \geq 1

and the following discussion still holds. Here a linear growth condition simplifies the demonstration. The above requirements for

G (x, ξ)

can be met easily. For instance, the objective function of the two-stage stochastic programming problem with fixed recourse satisfies the above conditions (see [11], Proposition 3.2).

Due to its attractive modeling technique, the quantitative stability analysis of stochastic optimization models with dominance constraints has been recently investigated in several works. Dencheva et al. first studied in [22] stochastic optimization problems with first order stochastic dominance constraints, which was extended by Dencheva and Römisch in [17] to the problem with general kth (

k \geq 2

) order stochastic dominance constraints. In [24], Chen and Jiang weaken the assumptions of the quantitative stability analysis in [17] by considering the case that

G (x, ξ)

is generated by the two-stage fully random stochastic programming problem.

To establish the convergence results, we first investigate the quantitative stability of model (6). By convention, we consider its relaxed problem (see also [17,24]):

min \{f (x) : x \in D, E [{(η - G (x, ξ))}_{+}^{k - 1}] \leq E [{(η - Y)}_{+}^{k - 1}] for \forall η \in I\},

(9)

where

I \subseteq R

is a compact interval:

{(\cdot)}_{+} : = max {0, \cdot}

.

In view of our focus in this study, we reestablish the quantitative stability conclusions of Problem (9) in what follows. We use

P_{ξ}

and

P_{Y}

to denote the probability distributions of

ξ

and Y, respectively. We denote the feasible solution set of Problem (9) by

X (P_{ξ}, P_{Y}) = \{x \in D : E_{P_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] \leq E_{P_{Y}} [{(η - Y)}_{+}^{k - 1}] for \forall η \in I\}

and its perturbed feasible solution set under

(Q_{ξ}, Q_{Y})

by

X (Q_{ξ}, Q_{Y}) = \{x \in D : E_{Q_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] \leq E_{Q_{Y}} [{(η - Y)}_{+}^{k - 1}] for \forall η \in I\} .

First we examine the quantitative stability of the feasible solution set.

Proposition 1.

Let D be compact,

P_{ξ} \in P_{k + p - 2} (Ξ)

,

P_{Y} \in P_{k - 1} (Y)

, and G satisfy the locally Lipschitz continuity condition (7) and the linear growth condition (8). Then, there exist constants

L > 0

and

δ > 0

such that

d_{H} (X (P_{ξ}, P_{Y}), X (Q_{ξ}, Q_{Y})) \leq L (ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y}))

whenever

Q_{ξ} \in P_{k + p - 2} (Ξ)

,

Q_{Y} \in P_{k - 1} (Y)

and the pair

(Q_{ξ}, Q_{Y})

satisfies

ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y}) < δ

, where

d_{H} (\cdot, \cdot)

denotes the Pompeiu–Hausdorff distance.

Proof.

We know from the proof of [17] (Proposition 3.2) that

\begin{matrix} d_{H} (X (P_{ξ}, P_{Y}), X (Q_{ξ}, Q_{Y})) \\ \leq & \frac{1}{(k - 1)!} max_{η \in I} | E_{P_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] - E_{Q_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] | \\ + \frac{1}{(k - 1)!} max_{η \in I} | E_{P_{Y}} [{(η - Y)}_{+}^{k - 1}] - E_{Q_{Y}} [{(η - Y)}_{+}^{k - 1}] | \end{matrix}

whenever the right-hand side is less than or equal to some positive scalar

\bar{δ}

.

In view of this, we estimate

max_{η \in I} | E_{P_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] - E_{Q_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] |

and

max_{η \in I} | E_{P_{Y}} [{(η - Y)}_{+}^{k - 1}] - E_{Q_{Y}} [{(η - Y)}_{+}^{k - 1}] |,

respectively.

Note the fact that (see [17], (3.9))

| {(η - t)}_{+}^{k - 1} - {(η - \hat{t})}_{+}^{k - 1} | \leq K_{I} (k - 1) max {1, | t |, | \hat{t} {|}}^{k - 2} | t - \hat{t} |

for some positive constant

K_{I}

and any

η \in I

. Then, we have

\begin{matrix} | {(η - G (x, ξ))}_{+}^{k - 1} - {(η - G (x, \hat{ξ}))}_{+}^{k - 1} | \\ \leq & K_{I} (k - 1) max {\{1, | G (x, ξ) |, | G (x, \hat{ξ}) |\}}^{k - 2} | G (x, ξ) - G (x, \hat{ξ}) | \\ \leq & K_{I} (k - 1) max {\{1, C_{G} (D) max {1, ∥ξ∥}, C_{G} (D) max \{1, ∥\hat{ξ}∥\}\}}^{k - 2} \\ \cdot L_{G} max {1, ∥ξ∥, ∥\hat{ξ}∥}^{p - 1} ∥ξ - \hat{ξ}∥ \\ \leq & K_{I} (k - 1) (C_{G} (D) + 1) L_{G} max {\{1, ∥ξ∥, ∥\hat{ξ}∥\}}^{k + p - 3} ∥ξ - \hat{ξ}∥ . \end{matrix}

This means that

\begin{matrix} max_{η \in I} | E_{P_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] - E_{Q_{ξ}} [{(η - G (x, ξ))}_{+}^{k - 1}] | \leq L_{1} ζ_{k + p - 2} (P_{ξ}, Q_{ξ}), \end{matrix}

where

L_{1} : = K_{I} (k - 1) (C_{G} (D) + 1) L_{G}

.

Similarly, we have

\begin{matrix} | {(η - Y)}_{+}^{k - 1} - {(η - \hat{Y})}_{+}^{k - 1} | \leq K_{I} (k - 1) max {\{1, | Y |, | \hat{Y} |\}}^{k - 2} | Y - \hat{Y} |, \end{matrix}

which means that

\begin{matrix} max_{η \in I} | E_{P_{Y}} [{(η - Y)}_{+}^{k - 1}] - E_{Q_{Y}} [{(η - Y)}_{+}^{k - 1}] | \leq K_{I} (k - 1) ζ_{k - 1} (P_{Y}, Q_{Y}) . \end{matrix}

Taking

L : = \frac{1}{(k - 1)!} max {L_{1}, K_{I} (k - 1)}

, we have

\begin{matrix} d_{H} (X (P_{ξ}, P_{Y}), X (Q_{ξ}, Q_{Y})) \leq L (ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y})), \end{matrix}

whenever

ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y}) \leq δ : = \frac{\bar{δ}}{L}

. □

The quantitative stability result in Proposition 1 differs in two perspectives from the corresponding results in [17]. One is the locally Lipschitz continuity of G; the other is the probability metric we choose. In [17], the authors assumed that G is Lipschitz continuous, and adopted Rachev metrics and the

(k - 1)

th order Wasserstein metric. As far as we know, there does not exist data-driven results under Rachev metrics.

Let

v (P_{ξ}, P_{Y})

and

S (P_{ξ}, P_{Y})

denote the optimal value and optimal solution set of Problem (9), respectively. Similar to that in Section 4.1, we can define the growth function of Problem (9) as

ψ_{P_{ξ}, P_{Y}} (τ) = inf {f (x) - v (P_{ξ}, P_{Y}) : d (x, S (P_{ξ}, P_{Y})) \geq τ, x \in X (P_{ξ}, P_{Y})} .

Then, its inverse function and the associated conditioning function are

ψ_{P_{ξ}, P_{Y}}^{- 1} (t) : = sup {τ \in R_{+} : ψ_{P_{ξ}, P_{Y}} (τ) \leq t}

and

Ψ_{P_{ξ}, P_{Y}} (η) : = η + ψ_{P_{ξ}, P_{Y}}^{- 1} (2 η) .

Proposition 2.

Under the conditions of Proposition 1, there exist constants

\hat{L} > 0

and

\hat{δ} > 0

such that

\begin{matrix} | v (P_{ξ}, P_{Y}) - v (Q_{ξ}, Q_{Y}) | & \leq \hat{L} (ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y})), \\ d (S (Q_{ξ}, Q_{Y}), S (P_{ξ}, P_{Y})) & \leq Ψ_{P_{ξ}, P_{Y}} (\hat{L} (ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y}))) \end{matrix}

whenever

ζ_{k + p - 2} (P_{ξ}, Q_{ξ}) + ζ_{k - 1} (P_{Y}, Q_{Y}) < \hat{δ}

.

Proof.

Since f is convex, f is locally Lipschitz continuous. Since D is compact, f is in fact Lipschitz continuous over D. Then, the assertions follow from a similar proof as that for [17] (Theorem 3.3). □

Now we consider the iid samples of

ξ

and Y. For convenience, we assume that the samples drawn from

ξ

and Y have the same sample size N. The N iid samples of

ξ

are

ξ^{1}, ξ^{2}, \dots, ξ^{N}

and the N iid samples of Y are

Y^{1}, Y^{2}, \dots, Y^{N}

. Then, we have the following empirical distributions:

P_{ξ, N} = \frac{1}{N} \sum_{i = 1}^{N} 1_{ξ^{i}},

and

P_{Y, N} = \frac{1}{N} \sum_{i = 1}^{N} 1_{Y^{i}} .

With these preparations, we can establish the following convergence results.

Theorem 6.

Let D be compact,

P_{ξ} \in P_{k + p - 2} (Ξ)

,

P_{Y} \in P_{k - 1} (Y)

, and G satisfy the locally Lipschitz continuity condition (7) and the linear growth condition (8).

(i): We have

$d_{H} (X (P_{ξ}, P_{Y}), X (P_{ξ, N}, Q_{Y, N})) \to 0$

and

$\begin{matrix} | v (P_{ξ}, P_{Y}) - v (P_{ξ, N}, Q_{Y, N}) | \to 0, \\ d (S (P_{ξ, N}, Q_{Y, N}), S (P_{ξ}, P_{Y})) \to 0 \end{matrix}$

with probability 1, as $N \to \infty$ .
(ii): If, moreover,

$E_{P_{ξ}} [exp ({∥ ξ ∥}^{b})] = \int_{Ξ} exp ({∥ ξ ∥}^{b}) P_{ξ} (d ξ) < + \infty$

and

$E_{P_{Y}} [exp ({| Y |}^{c})] = \int_{Y} exp ({| y |}^{c}) P_{Y} (d y) < + \infty$

for some $b > k + p - 2$ and $c > k - 1$ , then, for $ϵ \in (0, 1 / 2]$ , there exist positive scalers $α_{1}$ depending on $P_{ξ}$ , b and s; $β_{1}$ depending on $P_{ξ}$ , b, s and ϵ; $α_{2}$ depending on $P_{Y}$ and c, and $β_{2}$ depending on $P_{Y}$ , c and ϵ, such that

$\begin{matrix} P (d_{H} (X (P_{ξ}, P_{Y}), X (P_{ξ, N}, Q_{Y, N})) \geq L ϵ) & \leq α_{1} exp (- β_{1} N) + α_{2} exp (- β_{2} N) \end{matrix}$

and

$\begin{matrix} P (| v (P_{ξ}, P_{Y}) - v (P_{ξ, N}, Q_{Y, N}) | \geq \hat{L} ϵ) & \leq α_{1} exp (- β_{1} N) + α_{2} exp (- β_{2} N), \\ P (d (S (P_{ξ, N}, Q_{Y, N}), S (P_{ξ}, P_{Y})) \geq Ψ_{P_{ξ}, P_{Y}} (\hat{L} ϵ)) & \leq α_{1} exp (- β_{1} N) + α_{2} exp (- β_{2} N), \end{matrix}$

where L and $\hat{L}$ are defined in Propositions 1 and 2, respectively.

Proof.

Part (i) can be similarly proved as that in Theorem 4 by utilizing Theorem 2 and Proposition 1.

For Part (ii), we have

\begin{matrix} P (d_{H} (X (P_{ξ}, P_{Y}), X (P_{ξ, N}, P_{Y, N})) \geq L ϵ) \\ \leq & P (ζ_{k + p - 2} (P_{ξ}, P_{ξ, N}) + ζ_{k - 1} (P_{Y}, P_{Y, N}) \geq ϵ) \\ \leq & P (ζ_{k + p - 2} (P_{ξ}, P_{ξ, N}) \geq \frac{ϵ}{2}) + P (ζ_{k - 1} (P_{Y}, P_{Y, N}) \geq \frac{ϵ}{2}) \\ \leq & α_{1} exp (- β_{1} N) + α_{2} exp (- β_{2} N), \end{matrix}

where the last inequality follows from Theorem 3;

α_{1}

depends on

P_{ξ}

, b, and s;

β_{1}

depends on

P_{ξ}

, b, s, and

ϵ

;

α_{2}

depends on

P_{Y}

and c; and

β_{2}

depends on

P_{Y}

, c, and

ϵ

.

The second and third probability inequalities can be analogously verified and thus we omit the proof here. □

4.3. Data-Driven DRO Problems with FM Ball

A general stochastic optimization model can be formulated as

min_{x \in X} E_{P} [g (x, ξ)],

(10)

where

g : X \times Ξ \to R

,

X \subseteq R^{n}

, and

Ξ \subseteq R^{s}

is the support set of

ξ

. The sample average approximation (SAA) is usually used to solve Problem (10) numerically. The SAA method acquiescently assumes that we can generate any number of samples based on P. To better approximate Problem (10), a large sample size is needed [28]. However, in practice, the true probability distribution P cannot be known exactly, and thus we cannot generate a sufficiently large number of samples to make the SAA method well-defined, due to the expensive cost for more samples. However, it is possible for us to obtain a limited number of samples or scenarios, such as historical data. Under these settings, the data-driven DRO model is proposed [18,29,30]. The natural idea is to use the partial information to construct an ambiguity set such that the true probability distribution is included in the ambiguity set. As pointed out in [18], under certain conditions, it offers powerful out-of-sample performance guarantees.

For further discussion, we denote the limited finite samples by

ξ^{1}, ξ^{2}, \dots, ξ^{N}

and the corresponding empirical distribution by

P_{N}

. Since the number of samples N is limited, we cannot adopt the classical SAA method, which requires that the sample size tends to infinity. However, we can use the limited information to construct a set of probability measures which contains the true one, that is, the ambiguity set. In this subsection, we consider the following FM ball-based ambiguity set:

B_{r} (P_{N}) : = {Q \in P (Ξ) : ζ_{p} (Q, P_{N}) \leq r},

where

p \geq 1

, the positive constant r stands for the confidence parameter determined by the decision maker. Then, we have the data-driven DRO problem with the FM ball-based ambiguity set of Problem (10) as follows:

min_{x \in X} sup_{Q \in B_{r} (P_{N})} E_{Q} [g (x, ξ)] .

(11)

It is common for us to see that the Wasserstein ball is used to build the ambiguity set, for example, [18]. To further explain the reasonability and motivations for us to adopt the FM metric, we have the following comments.

Remark 3.

As we know, a key issue for DRO problems is how to build the ambiguity set. Different kinds of ambiguity sets have been proposed, such as moment information [31], ζ-ball [32], and so on. Of course, the FM metric, as a specific case of the ζ-structure probability metric, can be employed to construct the ambiguity set.

More importantly, the decision maker can utilize the limited empirical distribution

P_{N}

to obtain an approximate optimal value, say

v (P_{N})

. By prior experience, the decision maker usually has some confidence, measured by the derivation constant

\hat{r} > 0

, that the true optimal value, denoted by

v (P)

, locates in the interval

[v (P_{N}) - \hat{r}, v (P_{N}) + \hat{r}]

. Frequently,

g (x, ξ)

is locally Lipschitz continuous in the following sense:

| g (x, ξ) - g (x, ξ^{'}) | \leq L max {\{1, ∥ξ∥, ∥ξ^{'}∥\}}^{p - 1} ∥ξ^{'} - ξ∥

for some positive constant L. A typical example is that

g (x, ξ)

is the objective function of the two-stage stochastic programming problem, here

p = 2

(see [11], Proposition 3.2). Then, we have the quantitative relationship:

| v (P) - v (P_{N}) | \leq L ζ_{p} (P, P_{N}) .

Therefore, it is reasonable for the decision maker to consider the ambiguity set

{Q \in P (Ξ) : ζ_{p} (Q, P_{N}) \leq r : = \hat{r} / L} .

Moreover, since

ζ_{p} (P, P_{N}) \to 0

with probability 1, as

N \to \infty

(see Theorem 2), P must be included in

{Q \in P (Ξ) : ζ_{p} (Q, P_{N}) \leq r}

for suitable N and r.

Finally, we have

ζ_{p} (P, Q) \geq D_{W} (P, Q)

. The equality holds if

p = 1

. Thus,

{Q \in P (Ξ) : ζ_{p} (Q, P_{N}) \leq r} \subseteq {Q \in P (Ξ) : D_{W} (Q, P_{N}) \leq r} .

This tells us that the ambiguity set constructed by the FM ball is tighter than that constructed with the Wasserstein ball.

All these arguments motivate us to consider the data-driven DRO problem with the FM ball-based ambiguity set.

We use

v^{*}

and

v_{N}

to denote the optimal values of Problems (10) and (11), respectively.

To quantify the out-of-sample performance of the data-driven DRO problem (11), we examine the following probability

P (E_{P} [g (x_{N}, ξ)] \leq v_{N}),

where

x_{N}

is any optimal solution of Problem (11). Of course, we hope that, for sufficiently small

ϵ > 0

, there exists a finite positive integer

N_{0}

such that

P (E_{P} [g (x_{N}, ξ)] \leq v_{N}) \geq 1 - ϵ

(12)

for any

N \geq N_{0}

.

If P satisfies

ζ_{p} (P, P_{N}) < r

, we have

P \in B_{r} (P_{N})

. Thus,

E_{P} [g (x, ξ)] \leq sup_{Q \in B_{r} (P_{N})} E_{Q} [g (x, ξ)]

for any

x \in X

, which of course implies that

E_{P} [g (x_{N}, ξ)] \leq sup_{Q \in B_{r} (P_{N})} E_{Q} [g (x_{N}, ξ)] = v_{N} .

From Theorem 3, we have for any

r \in (0, 1 / 2]

that

P (ζ_{p} (P, P_{N}) \geq r) \leq \hat{α} exp (- \hat{β} (r) N),

here we use the notation

\hat{β} (r) > 0

to stress the dependence of

\hat{β}

on r. Consequently,

\begin{matrix} P (ζ_{p} (P, P_{N}) \leq r) & \geq P (ζ_{p} (P, P_{N}) < r) \\ \geq 1 - \hat{α} exp (- \hat{β} (r) N) . \end{matrix}

A sufficient condition to satisfy (12) is

1 - \hat{α} exp (- \hat{β} (r) N) \geq 1 - ϵ,

which is equivalent to

N \geq \frac{1}{\hat{β} (r)} log (\frac{\hat{α}}{ϵ}) .

Denote

N_{0} = ⌈\frac{1}{\hat{β} (r)} log (\frac{\hat{α}}{ϵ})⌉,

(13)

where

⌈ \cdot ⌉

stands for rounding up to an integer. Sometimes, we use the notation

N_{0} (ϵ, r)

to stress the dependence of

N_{0}

on

ϵ

and r.

Summarizing the above discussions, we obtain the following so-called finite sample guarantee property (see also [18,30]).

Proposition 3

(Finite sample guarantee). Let

P \in P_{p} (Ξ)

(p \geq 1)

and Assumption 1 hold for some

b > p

. For any

r \in (0, 1 / 2]

,

ϵ \in (0, 1)

and

N_{0}

defined in (13), we have that (12) holds for every

N \geq N_{0}

.

Proposition 3 tells us, for the fixed confidence parameter r, at least how large the sample size should be to ensure the significance level

ϵ

. Now we slightly modify model (11) and consider the following data-driven DRO problem:

min_{x \in X} sup_{Q \in B_{r_{N}} (P_{N})} E_{Q} [g (x, ξ)],

(14)

where

r_{N} > 0

and

r_{N} \to 0

as

N \to \infty

. It reflects the natural fact that the decision maker becomes more confident with more information. Meanwhile, the model (11) emphasizes the fixed limited information. We use

{\hat{v}}_{N}

to denote the optimal value of Problem (14). In what follows, we investigate the asymptotic consistency whenever N tends to infinity. To this end, we need the following lemma.

Lemma 6.

Let

{A_{N}}

and

{B_{N}}

be two sequences of random variables defined on the probability space

(Ω, F, P)

. If

{B_{N}}

converges almost surely and

P \{A_{N} \leq B_{N}\} \geq 1 - κ_{N}

(15)

for

N \in N

, where

κ_{N} \in (0, 1)

with

\sum_{N = 1}^{\infty} κ_{N} < + \infty

, we have

P \{\underset{N \to \infty}{lim inf} A_{N} \leq lim_{N \to \infty} B_{N}\} = 1 .

Proof.

We prove by contradiction. That is, we assume that

P \{\underset{N \to \infty}{lim inf} A_{N} \leq lim_{N \to \infty} B_{N}\} < 1 .

This implies that there exists a subset

\hat{Ω} \subseteq Ω

with

P (\hat{Ω}) > 0

such that

\underset{N \to \infty}{lim inf} A_{N} (\hat{ω}) > lim_{N \to \infty} B_{N} (\hat{ω})

for every

\hat{ω} \in \hat{Ω}

. Define the sequence

{Ω_{N}}

as

Ω_{N} = \{ω \in Ω : A_{N} (ω) > B_{N} (ω)\}

for

N \in N

. Obviously, according to (15), we have

P (Ω_{N}) \leq κ_{N}

, which implies that

\sum_{N = 1}^{\infty} P (Ω_{N}) \leq \sum_{N = 1}^{\infty} κ_{N} < + \infty .

Then, we can always choose a sufficiently large

\hat{N} \in N

, such that

\sum_{N = \hat{N}}^{\infty} P (Ω_{N}) < P (\hat{Ω}) .

Choose

ω \in \hat{Ω} ∖ (\cup_{N = \hat{N}}^{\infty} Ω_{N}),

and we have

A_{N} (ω) \leq B_{N} (ω)

for all

N \geq \hat{N}

, which implies that

\underset{N \to \infty}{lim inf} A_{N} (ω) \leq lim_{N \to \infty} B_{N} (ω) .

This contradicts the definition of

\hat{Ω}

. We complete the proof. □

The following proposition states that the optimal value and optimal solution set of the data-driven DRO problem (14) converge to those of the original Problem (10), which verifies the reasonability of our data-driven DRO model (14).

Proposition 4

(Asymptotic consistency). Suppose that

g (x, ξ)

is locally Lipschitz continuous in the following sense:

| g (x, ξ) - g (x, ξ^{'}) | \leq L max {1, ∥ξ∥, ∥ξ^{'}∥}^{p - 1} ∥ξ^{'} - ξ∥

for every

x \in X

, where

L > 0

and

p \geq 1

. Let

x_{N}

be any optimal solution of Problem (14). Then, the following assertions hold:

(i): ${\hat{v}}_{N} \to v^{*}$ with probability 1, as $N \to \infty$ ;
(ii): If, moreover, X is closed, $g (\cdot, ξ)$ is lower semicontinuous for every $ξ \in Ξ$ and $g (x, ξ)$ dominates some P-integrable function uniformly with respect to $x \in X$ , then, any accumulation point of ${x_{N}}$ is an optimizer of Problem (10) almost surely.

Proof.

Part (i): Notice that

\begin{matrix} | {\hat{v}}_{N} - v^{*} | & = | min_{x \in X} sup_{Q \in B_{r_{N}} (P_{N})} E_{Q} [g (x, ξ)] - min_{x \in X} E_{P} [g (x, ξ)] | \\ \leq max {sup_{Q \in B_{r_{N}} (P_{N})} E_{Q} [g (x^{*}, ξ)] - E_{P} [g (x^{*}, ξ)], \\ E_{P} [g (x_{N}, ξ)] - sup_{Q \in B_{r_{N}} (P_{N})} E_{Q} [g (x_{N}, ξ)]}, \end{matrix}

where

x^{*}

and

x_{N}

are any optimal solutions of Problems (10) and (14), respectively. For the first term on the right-hand side, we have

\begin{matrix} sup_{Q \in B_{r_{N}} (P_{N})} E_{Q} [g (x^{*}, ξ)] - E_{P} [g (x^{*}, ξ)] & \leq E_{Q_{N}} [g (x^{*}, ξ)] - E_{P} [g (x^{*}, ξ)] + δ_{N} \\ \leq L ζ_{p} (Q_{N}, P) + δ_{N} \\ \leq L r_{N} + δ_{N} \to 0, \end{matrix}

almost surely, as

N \to \infty

, where the first inequality is due to the definition of supremum for some

δ_{N} > 0

with

δ_{N} \to 0

almost surely and

Q_{N} \in B_{r_{N}} (P_{N})

; the second inequality follows from the definition of FM metric. Similarly, we can derive

E_{P} [g (x_{N}, ξ)] - sup_{Q \in B_{r_{N}} (P_{N})} E_{Q} [g (x_{N}, ξ)] \to 0

almost surely, as

N \to \infty

. Thus, we obtain that

| {\hat{v}}_{N} - v^{*} | \to 0 almost surely, as N \to \infty .

Part (ii): Without loss of generality, in the following discussion, we assume

x_{N} \to \hat{x}

with probability 1 as

N \to \infty

. Moreover, we select a sequence

{ϵ_{k}}

with

ϵ_{k} \in (0, 1)

and

\sum_{k = 1}^{\infty} ϵ_{k} < \infty

. According to (12), for each pair

(ϵ_{k}, r_{k})

with

r_{k}

defined in (14), we can select an

N_{k} \geq N_{0} (ϵ_{k}, r_{k})

(

N_{0} (ϵ_{k}, r_{k})

is defined in (13)) such that

P (E_{P} [g (x_{N_{k}}, ξ)] \leq {\hat{v}}_{N_{k}}) \geq 1 - ϵ_{k} .

We know from Lemma 6 and assertion (i) that

P (\underset{k \to \infty}{lim inf} E_{P} [g (x_{N_{k}}, ξ)] \leq lim_{k \to \infty} {\hat{v}}_{N_{k}} = v^{*}) = 1 .

(16)

Then, the following inequalities hold almost surely:

v^{*} \overset{(a)}{\leq} E_{P} [g (\hat{x}, ξ)] \overset{(b)}{\leq} E_{P} [\underset{k \to \infty}{lim inf} g (x_{N_{k}}, ξ)] \overset{(c)}{\leq} \underset{k \to \infty}{lim inf} E_{P} [g (x_{N_{k}}, ξ)] \overset{(d)}{\leq} v^{*},

where (a) follows from

\hat{x} \in X

due to the closedness of X; (b) follows from the lower semicontinuity of

g (\cdot, ξ)

for every

ξ \in Ξ

; (c) is due to Fatou’s lemma; (d) follows from (16). □

Remark 4.

Propositions 3 and 4 establish the finite sample guarantee and the asymptotic consistency, which are two desirable properties of the data-driven DRO problem [18,30]. Different from the existing results in [18] where the Wasserstein ball is used to construct the ambiguity set, we adopt the FM ball. Due to the feature of Wasserstein metric, to ensure the existence of the significance parameter ϵ, they explicitly derived the radius

r (ϵ, N)

depending on ϵ and N and the finite sample size

N_{0} (ϵ)

depending only on ϵ. In Proposition 3, we view both r and ϵ as parameters because

\hat{β}

couples with ϵ implicitly in Theorem 3. Moreover, the assumptions for the asymptotic consistency (Proposition 4) are different from those in [18] (Theorem 3.6), where the upper semicontinuity and linear growth were employed. Here we use the locally Lipschitz continuity but a weaker assumption of the lower bound. Specially, Ref. [18] (Theorem 3.6) employs Borel–Cantelli lemma to obtain

P \{E_{P} [g (x_{N}, ξ)] \leq {\hat{v}}_{N} f o r s u f f i c i e n t l y l a r g e N\} = 1 .

This is not applicable for our case, so we need Lemma 6.

4.4. Discrete Approximation for DRO Problems with General Moment Information

We consider the following general DRO problem:

min_{x \in X} sup_{P \in P} E_{P} [h (x, ξ)],

(17)

where

X \subseteq R^{n}

is a compact set,

h : X \times Ξ \to R

,

P : = {P \in P (Ξ) : E_{P} [Γ (ξ)] \in K}

,

K

is a closed and convex set in the Cartesian product of some finite dimensional vector and/or matrix spaces, and

Γ

is a general mapping on

Ξ

. We implicitly assume that, for each

x \in X

,

E_{P} [h (x, ξ)] < + \infty

for all

P \in P

.

The above ambiguity set

P

is very general, and it covers almost all the available ambiguity sets with moment information (see, e.g., [19], Examples 3–5). Zhang et al. discussed in [33] the quantitative stability of the DRO problem with a general moment information ambiguity set. There are usually two ways to numerically solve Problem (17): One is to use some kind of duality argument to reformulate Problem (17) as a solvable Problem [18,31]; the other is to discretize the ambiguity set, which leads to a saddle point problem in the finite dimensional space [19]. For instance, the discrete approximation in [19] is conducted under a bounded support set. In this part, by employing our results in Section 3, we consider the discrete approximation for problem (17) under weaker conditions.

Denote by

{\hat{P}}_{N}

the collection of all discrete distributions which have at most N supporting elements, that is,

{\hat{P}}_{N} = \{\sum_{i = 1}^{N} p_{i} 1_{ξ^{i}} : \sum_{i = 1}^{N} p_{i} = 1, p_{j} \geq 0, ξ^{j} \in Ξ, j = 1, 2, \dots, N\} .

We define the discrete approximation of

P

as

P_{N} = {Q \in {\hat{P}}_{N} : E_{Q} [Γ (ξ)] \in K} .

Obviously,

P_{N} \subseteq P

. Then, the discrete approximation of Problem (17) can be written as

min_{x \in X} sup_{P \in P_{N}} E_{P} [h (x, ξ)] .

(18)

We use

v (P)

and

S (P)

to denote the optimal value and optimal solution set of Problem (17).

v (P_{N})

and

S (P_{N})

are the optimal value and optimal solution set of Problem (18). To make sense of the discrete approximation, we hope that Problem (18) can approximately solve Problem (17) when N is sufficiently large.

To continue the following discussion, we define the growth function of Problem (17)

ψ_{P} : R_{+} \to R_{+}

as

ψ_{P} (τ) = inf \{sup_{P \in P} E_{P} [h (x, ξ)] - v (P) : d (x, S (P)) \geq τ, x \in X\}

and its inverse function is

ψ_{P}^{- 1} (t) : = sup {τ \in R_{+} : ψ_{P} (τ) \leq t} .

Thus, the associated conditioning function

Ψ_{P} : R_{+} \to R_{+}

is defined as

Ψ_{P} (η) : = η + ψ_{P}^{- 1} (2 η) .

Immediately, we have the following quantitative stability results:

Proposition 5.

Suppose that: (i)

P \subseteq P_{p} (Ξ)

; (ii)

h (x, ξ) \geq g (ξ)

for each

ξ \in Ξ

and a measurable function

g : Ξ \to R

with

E_{P} [g (ξ)] > - \infty

for any

P \in P

; (iii)

h (\cdot, ξ)

is lower semicontinuous for each

ξ \in Ξ

; (iv)

| h (x, ξ) - h (x, ξ^{'}) | \leq L_{h} max {\{1, ∥ξ∥, ∥ξ^{'}∥\}}^{p - 1} ∥ξ^{'} - ξ∥

for each

x \in X

. Then,

S (P) \neq \emptyset

and

\begin{matrix} | v (P) - v (P_{N}) | & \leq L_{h} ζ_{p} (P, P_{N}), \\ \emptyset \neq S (P_{N}) & \subseteq S (P) + Ψ_{P} (L_{h} ζ_{p} (P, P_{N})) . \end{matrix}

Proof.

Since

h (x, ξ) \geq g (ξ)

and

h (\cdot, ξ)

is lower semicontinuous for each

ξ \in Ξ

, we have from Fatou’s lemma that

\underset{k \to \infty}{lim inf} E_{P} [h (x_{k}, ξ)] \geq E_{P} [\underset{k \to \infty}{lim inf} h (x_{k}, ξ)] \geq E_{P} [h (\bar{x}, ξ)]

holds for any

{x_{k}} \subseteq X

such that

{lim}_{k \to \infty} x_{k} = \bar{x}

. This implies that

E_{P} [h (\cdot, ξ)]

is lower semicontinuous. According to [34] (Lemma 4.1),

{sup}_{P \in P} E_{P} [h (\cdot, ξ)]

is lower semicontinuous too. This, together with the compactness of X, ensures that

S (P) \neq \emptyset

. Similarly, we can prove that

S (P_{N}) \neq \emptyset

.

Note that

\begin{matrix} | v (P) - v (P_{N}) | & \overset{(a)}{=} min_{x \in X} sup_{P \in P} E_{P} [h (x, ξ)] - min_{x \in X} sup_{Q \in P_{N}} E_{Q} [h (x, ξ)] \\ \leq max_{x \in X} (sup_{P \in P} E_{P} [h (x, ξ)] - sup_{Q \in P_{N}} E_{Q} [h (x, ξ)]) \\ = max_{x \in X} (sup_{P \in P} inf_{Q \in P_{N}} (E_{P} [h (x, ξ)] - E_{Q} [h (x, ξ)])) \\ \overset{(b)}{\leq} max_{x \in X} (sup_{P \in P} inf_{Q \in P_{N}} L_{h} ζ_{p} (P, Q)) \\ = L_{h} ζ_{p} (P, P_{N}), \end{matrix}

where (a) follows from the fact

P_{N} \subseteq P

; (b) is due to the definition of the pth order FM metric.

Finally, based on the first assertion, the inclusion for the optimal solution sets can be analogously derived as that in [11]. □

For simplicity as well as to show the linear relationship more clearly, we write

E_{P} [Γ (ξ)]

as

〈P, Γ (ξ)〉

in what follows. We need the following technical assumption to proceed.

Assumption 3

(see [19]). The system

P : = {P : E_{P} [Γ (ξ)] \in K}

satisfies the following Slater condition:

〈\tilde{P}, Γ (ξ)〉 + δ B \subseteq K

for some

\tilde{P} \in P (Ξ)

and

δ > 0

.

Proposition 6.

Suppose that Assumption 3 holds and

P \subseteq P_{p} (Ξ)

. Then, there exists an

Ω_{0} \subseteq Ω

with

P (Ω_{0}) = 0

, such that for any

\hat{δ} < δ

and

ω \in Ω ∖ Ω_{0}

, we have

ζ_{p} (Q, P_{2 N}) \leq \frac{ζ_{p} (Q, \tilde{P}) + 1}{\hat{δ}} inf_{K \in K} ∥K - 〈Q, Γ (ξ)〉∥

for any

Q \in {\hat{P}}_{N}

and

N \geq \hat{N} (\hat{δ}, ω)

, where

\hat{N} (\hat{δ}, ω)

is a positive integer depending on

\hat{δ}

and ω.

Proof.

Let the empirical approximation of

\tilde{P}

be

{\tilde{P}}_{N}

. Then, we have from the law of large numbers that

〈{\tilde{P}}_{N}, Γ (ξ)〉 \to 〈\tilde{P}, Γ (ξ)〉

with probability 1, as

N \to \infty

. Equivalently, there exists an

Ω_{1} \subseteq Ω

with

P (Ω_{1}) = 0

, such that for any

\hat{δ} < δ

and

ω \in Ω ∖ Ω_{1}

, we have

∥〈{\tilde{P}}_{N} (ω), Γ (ξ)〉 - 〈\tilde{P}, Γ (ξ)〉∥ \leq δ - \hat{δ}

for

N \geq N_{1} (\hat{δ}, ω)

. This implies that

〈{\tilde{P}}_{N} (ω), Γ (ξ)〉 \in 〈\tilde{P}, Γ (ξ)〉 + (δ - \hat{δ}) B,

or equivalently,

〈{\tilde{P}}_{N} (ω), Γ (ξ)〉 + \hat{δ} B \subseteq 〈\tilde{P}, Γ (ξ)〉 + δ B \subseteq K

for

N \geq N_{1} (\hat{δ}, ω)

, where

B

is the unit closed ball in the space of

K

.

Notice that

{\tilde{P}}_{N} \in P_{N}

, and hence, for

N \geq N_{1} (\hat{δ}, ω)

, the Slater condition holds with respect to

\hat{δ}

for the system

P_{N} : = {P \in {\hat{P}}_{N} : E_{P} [Γ (ξ)] \in K} .

Now we define, for any

Q \in {\hat{P}}_{N}

,

ρ_{Q} = {inf}_{K \in K} ∥K - 〈Q, Γ (ξ)〉∥

and

\bar{Q} = (1 - \frac{ρ_{Q}}{ρ_{Q} + \hat{δ}}) Q + \frac{ρ_{Q}}{ρ_{Q} + \hat{δ}} {\tilde{P}}_{N} .

Obviously, we have

\bar{Q} \in {\hat{P}}_{2 N}

. Similar to that proof of [19] (Theorem 2), we again obtain

\bar{Q} \in P_{2 N}

. Then, we have

\begin{matrix} ζ_{p} (Q, P_{2 N}) & \leq ζ_{p} (Q, \bar{Q}) = sup_{g \in G_{{FM}_{p}}} | 〈Q, g〉 - 〈\bar{Q}, g〉 | \\ = \frac{ρ_{Q}}{ρ_{Q} + \hat{δ}} sup_{g \in G_{{FM}_{p}}} | 〈Q, g〉 - 〈{\tilde{P}}_{N} (ω), g〉 | \\ \leq \frac{ζ_{p} (Q, {\tilde{P}}_{N} (ω))}{\hat{δ}} inf_{K \in K} ∥K - 〈Q, Γ (ξ)〉∥ \\ \leq \frac{ζ_{p} (Q, \tilde{P}) + ζ_{p} (\tilde{P}, {\tilde{P}}_{N} (ω))}{\hat{δ}} inf_{K \in K} ∥K - 〈Q, Γ (ξ)〉∥ \\ \leq \frac{ζ_{p} (Q, \tilde{P}) + 1}{\hat{δ}} inf_{K \in K} ∥K - 〈Q, Γ (ξ)〉∥ \end{matrix}

for

N \geq N_{2} (ω)

and

ω \in Ω ∖ Ω_{2}

with

P (Ω_{2}) = 0

, where the last inequality follows from Theorem 2.

Finally, letting

Ω_{0} : = Ω_{1} \cup Ω_{2}

and

\hat{N} (\hat{δ}, ω) : = max {N_{1} (\hat{δ}, ω), N_{2} (ω)}

completes the proof. □

The following theorem states that the discrete approximation ambiguity set

P_{N}

converges to

P

as

N \to \infty

in the sense of FM metrics.

Theorem 7.

Suppose that: (i) Assumption 3 holds; (ii)

P \subseteq P_{p} (Ξ)

; (iii)

sup_{P \in P} ∥E_{P} [Γ (ξ)]∥ < + \infty a n d C_{P} : = sup_{P, Q \in P} ζ_{p} (P, Q) < + \infty .

Then,

lim_{N \to \infty} ζ_{p} (P, P_{N}) = 0

with probability 1.

Proof.

For any

P \in P

, by the triangle inequality, we have

\begin{matrix} ζ_{p} (P, P_{N}) & \leq ζ_{p} (P, P_{N}) + ζ_{p} (P_{N}, P_{N}) \end{matrix}

where

P_{N}

is the empirical distribution of P with N samples. Since

P_{N} \in {\hat{P}}_{N} \subseteq P

, we know from Proposition 6 that

\begin{matrix} ζ_{p} (P_{N}, P_{N}) & \leq \frac{ζ_{p} (P_{N}, \tilde{P}) + 1}{\hat{δ}} inf_{K \in K} ∥K - 〈P_{N}, Γ (ξ)〉∥ \\ \leq \frac{C_{P} + 1}{\hat{δ}} ∥〈P, Γ (ξ)〉 - 〈P_{N}, Γ (ξ)〉∥ \end{matrix}

for

N \geq \hat{N} (\hat{δ}, ω)

and almost every

ω \in Ω

. Thus, we have

\begin{matrix} ζ_{p} (P, P_{N}) & \leq ζ_{p} (P, P_{N}) + \frac{C_{P} + 1}{\hat{δ}} ∥〈P, Γ (ξ)〉 - 〈P_{N}, Γ (ξ)〉∥ . \end{matrix}

Subsequently,

\begin{matrix} ζ_{p} (P, P_{N}) & = sup_{P \in P} ζ_{p} (P, P_{N}) \\ \leq sup_{P \in P} ζ_{p} (P, P_{N}) + \frac{C_{P} + 1}{\hat{δ}} sup_{P \in P} ∥〈P, Γ (ξ)〉 - 〈P_{N}, Γ (ξ)〉∥ . \end{matrix}

For the first term on the right-hand side, the definition of supremum, the boundedness of

P

, and Theorem 2 give rise to

\begin{matrix} lim_{N \to \infty} sup_{P \in P} ζ_{p} (P, P_{N}) & \leq lim_{N \to \infty} ζ_{p} (P^{k}, P_{N}^{k}) + ϵ_{k} = ϵ_{k} \end{matrix}

with probability 1, where

{P^{k}}

is a sequence included in

P

such that

sup_{P \in P} ζ_{p} (P, P_{N}) \leq ζ_{p} (P^{k}, P_{N}^{k}) + ϵ_{k}

and

{ϵ_{k}}

is a positive sequence with

ϵ_{k} \to 0

as

N \to \infty

. Thus, we obtain

lim_{N \to \infty} sup_{P \in P} ζ_{p} (P, P_{N}) = 0

with probability 1.

Analogously, by the law of large numbers, we can derive that

lim_{N \to \infty} sup_{P \in P} ∥〈P, Γ (ξ)〉 - 〈P_{N}, Γ (ξ)〉∥ = 0

with probability 1.

Then, we complete the proof. □

The following corollary shows the reasonability for the approximation of Problem (18) to Problem (17).

Corollary 1.

Under the conditions of Proposition 5 and Theorem 7, we have

\begin{matrix} | v (P) - v (P_{N}) | & \to 0, \\ d (S (P_{N}), S (P)) & \to 0 . \end{matrix}

with probability 1, as

N \to \infty

.

Remark 5.

In this subsection, we investigated the discrete approximation of the DRO problem with the general moment information ambiguity set. Compared with the existing work [19], we have further weakened the necessary assumptions and extended them to a more general case. Firstly, the Lipschitz continuity of the objective function is required in [19] (Theorem 14) due to the adoption of the Wasserstein metric, so that the upper bound between the discrete approximation of the DRO problem and the original DRO problem can be derived [19] (Proposition 7). We only call for the locally Lipschitz continuity. More importantly, they restricted their discussion to the bounded support set case because the upper bound in [19] (Proposition 7) would be infinity when the support set is unbounded, which is not well defined in this case. However, our support set can be unbounded by employing our convergence results in Section 3.

5. Concluding Remarks

In this study, we investigated different kinds of convergence assertions about data-driven FM metrics and their possible applications. In view of the rich results about Wasserstein metrics (Lemmas 2 and 3), we first established the relationship between the FM metric and the Wasserstein metric (Lemma 4). Based on these results, the non-asymptotic moment estimate (Theorem 1), asymptotic convergence estimate (Theorem 2), and non-asymptotic concentration estimate (Theorem 3) for FM metrics were presented. These convergence assertions for FM metrics were applied to the asymptotic analyses of the empirical approximations of four kinds of stochastic optimization problems. The results sufficiently show the motivations of this study and its importance.

There are still some topics to settle in the future. For example, we leave the numerical tractability for the results in Section 4.3 and Section 4.4 for future work.

Author Contributions

Supervision, Z.C.; Writing—original draft, J.J.; Writing—review & editing, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by China Postdoctoral Science Foundation (Grant Number 2020M673117), the National Natural Science Foundation of China (Grant Numbers 11991023 and 11735011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 4.

According to the definition of FM metric, we have

\begin{matrix} ζ_{p} (P, Q) = sup_{g \in G_{p}} \int_{Ξ} g (ξ) (P - Q) (d ξ) . \end{matrix}

Moreover, it is easy to verify that

g (ξ)

adding any constant will not change the value of the integral. For simplification of the following discussion, without loss of generality, we hereafter set

g (ξ_{0}) = 0

for any fixed

ξ_{0} \in arg {min}_{ξ \in Ξ} ∥ξ∥

. We denote

M_{R} = {ξ \in Ξ : ∥ξ∥ \leq R}

and

M_{R}^{c}

is the complementary set of

M_{R}

. Since

B (0, R) \cap Ξ \neq \emptyset

and thus

ξ_{0} \in M_{R}

, we have an upper bound estimation of

g (ξ)

as follows:

\begin{matrix} | g (ξ) | & = | g (ξ) - g (ξ_{0}) | \leq max {1, ∥ξ∥, ∥ξ_{0}∥}^{p - 1} ∥ξ - ξ_{0}∥ \\ \leq max {1, ∥ξ∥, ∥ξ_{0}∥}^{p - 1} (∥ξ∥ + ∥ξ_{0}∥) . \end{matrix}

If

ξ \in M_{R}^{c}

, then,

∥ξ∥ > R \geq ∥ξ_{0}∥

, and we have the following upper bound of

| g (ξ) |

:

\begin{matrix} | g (ξ) | \leq {∥ξ∥}^{p - 1} (∥ξ∥ + ∥ξ_{0}∥) \leq 2 {∥ξ∥}^{p} . \end{matrix}

Then, we continue

\begin{matrix} ζ_{p} (P, Q) & = sup_{g \in G_{p}} (\int_{M_{R}} g (ξ) (P - Q) (d ξ) + \int_{M_{R}^{c}} g (ξ) (P - Q) (d ξ)) \\ \leq sup_{g \in G_{p}} (R^{p - 1} \int_{M_{R}} h (ξ) (P - Q) (d ξ) + 2 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + Q) (d ξ)), \end{matrix}

where

h : M_{R} \to R

is defined by

h (\cdot) : = g (\cdot) / R^{p - 1}

. It is easy to see that h is Lipschitz continuous on

M_{R}

with Lipschitz modulus 1. Based on Lemma 1, we can extend h to

R^{s}

, and its restriction on

Ξ

is denoted by

\tilde{h} (\cdot)

. Then,

\tilde{h} (\cdot)

is Lipschitz continuous on

Ξ

with Lipschitz modulus 1. Thus, we can continue

\begin{matrix} ζ_{p} (P, Q) & \leq sup_{g \in G_{p}} (R^{p - 1} \int_{Ξ} \tilde{h} (ξ) (P - Q) (d ξ) - R^{p - 1} \int_{M_{R}^{c}} \tilde{h} (ξ) (P - Q) (d ξ) \\ + 2 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + Q) (d ξ)) . \end{matrix}

So, for any

ξ \in Ξ

, we have

∥\tilde{h} (ξ) - \tilde{h} (ξ_{0})∥ \leq ∥ξ - ξ_{0}∥ .

Note that

\tilde{h} (ξ_{0}) = g (ξ_{0}) / R^{p - 1} = 0

, so

\begin{matrix} ∥\tilde{h} (ξ)∥ & = ∥\tilde{h} (ξ) - \tilde{h} (ξ_{0})∥ \\ \leq ∥ξ - ξ_{0}∥ \leq ∥ξ∥ + ∥ξ_{0}∥ . \end{matrix}

Similarly, this means that

∥\tilde{h} (ξ)∥ \leq 2 ∥ξ∥

for any

ξ \in M_{R}^{c}

. Then, we continue

\begin{matrix} ζ_{p} (P, Q) & \leq R^{p - 1} D_{W} (P, Q) + 2 R^{p - 1} \int_{M_{R}^{c}} ∥ξ∥ (P + Q) (d ξ) + 2 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + Q) (d ξ) \\ \leq R^{p - 1} D_{W} (P, Q) + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + Q) (d ξ) . \end{matrix}

The proof is complete. □

Proof of Theorem 1.

According to Lemma 4 with

Q = P_{N}

, we have

\begin{matrix} E [ζ_{p} (P, P_{N})] & \leq E [R^{p - 1} D_{W} (P, P_{N}) + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ)] \\ = R^{p - 1} E [D_{W} (P, P_{N})] + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} P (d ξ) + 4 E [\int_{M_{R}^{c}} {∥ξ∥}^{p} P_{N} (d ξ)] . \end{matrix}

Moreover, since

\begin{matrix} E [\int_{M_{R}^{c}} {∥ξ∥}^{p} P_{N} (d ξ)] & = E [\frac{1}{N} \sum_{i = 1}^{N} 1_{M_{R}^{c}} (ξ^{i}) \cdot {∥ξ^{i}∥}^{p}] \\ = E [1_{M_{R}^{c}} (ξ) \cdot {∥ξ∥}^{p}], \end{matrix}

we obtain

\begin{matrix} E [ζ_{p} (P, P_{N})] & \leq R^{p - 1} E [D_{W} (P, P_{N})] + 8 E [1_{M_{R}^{c}} (ξ) \cdot {∥ξ∥}^{p}] . \end{matrix}

Meanwhile, we know from Lemma 2 that

E [D_{W} (P_{N}, P)] \leq C {(E_{P} [{∥ξ∥}^{p}])}^{1 / p} α (N),

where

α (N) = (N^{- 1 / max {2, s}} log (1 + N) + N^{- (p - 1) / p}) \to 0

as

N \to \infty

. Then, we take

R = α {(N)}^{- 1 / p}

. Since

R \to + \infty

as

N \to \infty

, we have

R \geq 1

and

B (0, R) \cap Ξ \neq \emptyset

for sufficiently large N. Therefore, we have

\begin{matrix} E [ζ_{p} (P, P_{N})] & \leq R^{p - 1} E [D_{W} (P, P_{N})] + 8 E [1_{M_{R}^{c}} (ξ) \cdot {∥ξ∥}^{p}] \\ \leq C {(E_{P} [{∥ξ∥}^{p}])}^{1 / p} α {(N)}^{1 / p} + 8 E [1_{M_{R}^{c}} (ξ) \cdot {∥ξ∥}^{p}] \end{matrix}

and

E [1_{M_{R}^{c}} (ξ) \cdot {∥ξ∥}^{p}] \to 0, as N \to \infty .

Thus, letting

β_{N} = C {(E_{P} [{∥ξ∥}^{p}])}^{1 / p} α {(N)}^{1 / p} + 8 E [1_{M_{R}^{c}} (ξ) \cdot {∥ξ∥}^{p}]

completes the proof. □

Proof of Theorem 2.

To prove this assertion, we need to verify that: for any

ϵ > 0

, there exists a positive number

N (ϵ, ω)

such that

ζ_{p} (P, P_{N}) \leq ϵ

(A1)

as

N \geq N (x, ω)

for almost every

ω \in Ω

. Notice from Lemma 4 that

\begin{matrix} ζ_{p} (P, P_{N}) \leq R^{p - 1} D_{W} (P, P_{N}) + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) \end{matrix}

for sufficiently large R.

We can deduce from

P \in P_{p} (Ξ)

and Lemma 2 that

\begin{matrix} \int_{M_{R}^{c}} {∥ξ∥}^{p} P (d ξ) \to 0 as R \to + \infty \end{matrix}

and

\begin{matrix} \int_{M_{R}^{c}} {∥ξ∥}^{p} P_{N} (d ξ) \to \int_{M_{R}^{c}} {∥ξ∥}^{p} P (d ξ) as N \to \infty \end{matrix}

with probability 1. Thus, there always exists a sufficiently large positive number

R (ϵ)

such that

\begin{matrix} \int_{M_{R}^{c}} {∥ξ∥}^{p} P (d ξ) \leq \frac{ϵ}{32} \end{matrix}

(A2)

as

R \geq R (ϵ)

. Moreover, there exists a positive number

N_{1} : = N_{1} (ϵ, ω)

such that

\begin{matrix} | \int_{M_{R}^{c}} {∥ξ∥}^{p} P_{N} (d ξ) - \int_{M_{R}^{c}} {∥ξ∥}^{p} P (d ξ) | \leq \frac{ϵ}{16} \end{matrix}

as

N \geq N_{1}

with probability 1, which implies from the triangle inequality that

\begin{matrix} \int_{M_{R}^{c}} {∥ξ∥}^{p} P_{N} (d ξ) \leq \frac{3 ϵ}{32} \end{matrix}

(A3)

with probability 1. Combining (A2) with (A3), we have

\begin{matrix} 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) \leq 4 (\frac{ϵ}{32} + \frac{3 ϵ}{32}) = \frac{ϵ}{2} \end{matrix}

(A4)

as

R \geq R (ϵ)

and

N \geq N_{1}

, with probability 1.

On the other hand, we know from the Glivenko-Cantelli theorem [35] that

D_{W} (P, P_{N}) \to 0 as N \to \infty with probability 1,

which implies that there exists a positive number

N_{2} : = N_{2} (ϵ, ω)

such that

\begin{matrix} R^{p - 1} D_{W} (P, P_{N}) \leq \frac{ϵ}{2} \end{matrix}

(A5)

when

N \geq N_{2}

.

(A5) and (A4) mean (A1) by letting

N = max {N_{1}, N_{2}}

. This completes the proof. □

Proof of Theorem 3.

We know from Lemma 4 that

\begin{matrix} ζ_{p} (P, P_{N}) & \leq R^{p - 1} D_{W} (P, P_{N}) + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) . \end{matrix}

Then, we have

\begin{matrix} P (ζ_{p} (P, P_{N}) \geq ϵ) & \leq P (R^{p - 1} D_{W} (P, P_{N}) + 4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) \geq ϵ) \\ \leq P (R^{p - 1} D_{W} (P, P_{N}) \geq ϵ / 2) + P (4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) \geq ϵ / 2) . \end{matrix}

For the first term, we know from (3) that

\begin{matrix} P (R^{p - 1} D_{W} (P, P_{N}) \geq ϵ / 2) & = P (D_{W} (P, P_{N}) \geq ϵ / (2 R^{p - 1})) \\ \leq α exp (- β N {(\frac{ϵ}{2 R^{p - 1}})}^{max {4, s}}) . \end{matrix}

(A6)

We, in what follows, consider the estimation of the second term:

\begin{matrix} P (4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) \geq ϵ / 2) . \end{matrix}

Since

P \in P_{p} (Ξ)

, we can choose a sufficiently large

R = R (ϵ)

such that

\begin{matrix} \int_{M_{R}^{c}} {∥ξ∥}^{p} P (d ξ) \leq \frac{ϵ}{32} . \end{matrix}

Then, we have

\begin{matrix} P (4 \int_{M_{R}^{c}} {∥ξ∥}^{p} (P + P_{N}) (d ξ) \geq ϵ / 2) \\ \leq P (\int_{M_{R}^{c}} {∥ξ∥}^{p} P_{N} (d ξ) \geq \frac{3 ϵ}{32}) \\ = P (\int_{Ξ} 1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p} P_{N} (d ξ) \geq \frac{3 ϵ}{32}) \\ = P (\frac{1}{N} \sum_{i = 1}^{N} 1_{M_{R}^{c}} (ξ^{i}) {∥ξ^{i}∥}^{p} \geq \frac{3 ϵ}{32}) \\ = P (\frac{1}{N} \sum_{i = 1}^{N} 1_{M_{R}^{c}} (ξ^{i}) {∥ξ^{i}∥}^{p} - E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}] \geq \frac{3 ϵ}{32} - E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}]) \\ \leq P (\frac{1}{N} \sum_{i = 1}^{N} 1_{M_{R}^{c}} (ξ^{i}) {∥ξ^{i}∥}^{p} - E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}] \geq \frac{ϵ}{16}) . \end{matrix}

Furthermore, according to Cramér’s large deviation theorem, we have

\begin{matrix} P (\frac{1}{N} \sum_{i = 1}^{N} 1_{M_{R}^{c}} (ξ^{i}) {∥ξ^{i}∥}^{p} - E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}] \geq \frac{ϵ}{16}) \leq exp \{- N I (\frac{ϵ}{16})\}, \end{matrix}

where

I (\cdot)

is the so-called (large deviations) rate function defined as

I (\frac{ϵ}{16}) : = sup_{t \in R} \{\frac{ϵ}{16} t - log M (t)\}

and

\begin{matrix} M (t) & : = E [exp \{t (1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p} - E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}])\}] \\ = \frac{E [exp \{t 1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}\}]}{exp \{t E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}]\}} \\ \leq \frac{E [exp \{t {∥ξ∥}^{p}\}]}{exp \{t E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}]\}} \\ \leq \frac{E [exp \{{∥ξ∥}^{p}\}]}{exp \{- E [{∥ξ∥}^{p}]\}} < + \infty \end{matrix}

for

t \in [- 1, 1]

, where the last inequality follows from Assumption 1 with

b > p

.

We know from [28] (Section 7.2.9) that

M (t)

is positive, convex, and infinitely differentiable at the interior of its domain. This means that

log M (t)

is also convex and infinitely differentiable at the interior of its domain, which is consistent with the domain of

M (t)

. Since

M (t)

is finite on

[- 1, 1]

,

M (t)

is differentiable on

(- 1, 1)

. Note that

M^{'} (0) = E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p} - E [1_{M_{R}^{c}} (ξ) {∥ξ∥}^{p}]] = 0 .

Then, the derivative of

\frac{ϵ}{16} t - log M (t),

which is

\frac{ϵ}{16} - \frac{M^{'} (t)}{M (t)},

(A7)

is larger than 0 at

t = 0

. Due to its differentiability, which implies the continuity, there exists a sufficiently small

0 < \bar{t} \leq 1

such that (A7) is larger than 0 for any

t \in [0, \bar{t}]

. Then, for any

t \in (0, \bar{t}]

, we have

\frac{ϵ}{16} t - log M (t) > \frac{ϵ}{16} \cdot 0 - log M (0) = 0 .

Therefore, we obtain that

I (\frac{ϵ}{16})

is positive.

Finally, we obtain

\begin{matrix} P (ζ_{p} (P, P_{N}) \geq ϵ) & \leq α exp (- β N {(\frac{ϵ}{2 R^{p - 1}})}^{max {4, s}}) + exp \{- N I (\frac{ϵ}{16})\} \\ \leq (1 + α) exp (- min \{β {(\frac{ϵ}{2 R^{p - 1}})}^{max {4, s}}, I (\frac{ϵ}{16})\} N) . \end{matrix}

Letting

\hat{α} : = 1 + α

and

\hat{β} : = min \{β {(\frac{ϵ}{2 R^{p - 1}})}^{max {4, s}}, I (\frac{ϵ}{16})\}

completes the proof. □

References

Fournier, N.; Guillin, A. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 2015, 162, 707–738. [Google Scholar] [CrossRef] [Green Version]
Villani, C. Optimal Transport: Old and New; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; Volume 338. [Google Scholar]
Rachev, S.T.; Rüschendorf, L. Mass Transportation Problems: Volume I: Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998; Volume 1. [Google Scholar]
Horowitz, J.; Karandikar, R.L. Mean rates of convergence of empirical measures in the Wasserstein metric. J. Comput. Appl. Math. 1994, 55, 261–273. [Google Scholar] [CrossRef] [Green Version]
Weed, J.; Bach, F. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. arXiv 2017, arXiv:1707.00087. [Google Scholar] [CrossRef] [Green Version]
Dereich, S.; Scheutzow, M.; Schottstedt, R. Constructive quantization: Approximation by empirical measures. Ann. L’Ihp Probab. Stat. 2013, 49, 1183–1203. [Google Scholar] [CrossRef]
Bolley, F.; Guillin, A.; Villani, C. Quantitative concentration inequalities for empirical measures on non-compact spaces. Probab. Theory Relat. Fields 2007, 137, 541–593. [Google Scholar] [CrossRef] [Green Version]
Boissard, E. Simple bounds for the convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Probab. 2011, 16, 2296–2333. [Google Scholar] [CrossRef]
Zhao, C.; Guan, Y. Data-driven risk-averse two-stage stochastic program with ζ-structure probability metrics. Optim. Online 2015, 2, 1–40. [Google Scholar]
Römisch, W. Stability of Stochastic Programming Problems. Handb. Oper. Res. Manag. Sci. 2003, 10, 483–554. [Google Scholar]
Rachev, S.T.; Römisch, W. Quantitative stability in stochastic programming: The method of probability metrics. Math. Oper. Res. 2002, 27, 792–818. [Google Scholar] [CrossRef] [Green Version]
Römisch, W.; Vigerske, S. Quantitative stability of fully random mixed-integer two-stage stochastic programs. Optim. Lett. 2008, 2, 377–388. [Google Scholar] [CrossRef] [Green Version]
Han, Y.; Chen, Z. Quantitative stability of full random two-stage stochastic programs with recourse. Optim. Lett. 2015, 9, 1075–1090. [Google Scholar] [CrossRef]
Strugarek, C. On the Fortet-Mourier Metric for The Stability of Stochastic Optimization Problems, An Example; Humboldt-Universität zu Berlin: Berlin, Germany, 2004. [Google Scholar]
Shapiro, A. Monte Carlo sampling methods. Handb. Oper. Res. Manag. Sci. 2003, 10, 353–425. [Google Scholar]
Shapiro, A.; Xu, H. Stochastic mathematical programs with equilibrium constraints, modelling and sample average approximation. Optimization 2008, 57, 395–418. [Google Scholar] [CrossRef]
Dentcheva, D.; Römisch, W. Stability and sensitivity of stochastic dominance constrained optimization models. SIAM J. Optim. 2013, 23, 1672–1688. [Google Scholar] [CrossRef]
Esfahani, P.M.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
Liu, Y.; Pichler, A.; Xu, H. Discrete approximation and quantification in distributionally robust optimization. Math. Oper. Res. 2018, 44, 19–37. [Google Scholar] [CrossRef] [Green Version]
Kantorovich, L.V.; Rubinstein, G.S. On a space of completely additive functions. Vestn. Leningrad. Univ. 1958, 13, 52–59. [Google Scholar]
Valentine, F.A. A Lipschitz condition preserving extension for a vector function. Am. J. Math. 1945, 67, 83–93. [Google Scholar] [CrossRef]
Dentcheva, D.; Henrion, R.; Ruszczyński, A. Stability and sensitivity of optimization problems with first order stochastic dominance constraints. SIAM J. Optim. 2007, 18, 322–337. [Google Scholar] [CrossRef] [Green Version]
Dentcheva, D.; Ruszczyński, A. Robust stochastic dominance and its application to risk-averse optimization. Math. Program. 2010, 123, 85–100. [Google Scholar] [CrossRef]
Chen, Z.; Jiang, J. Stability analysis of optimization problems with kth order stochastic and distributionally robust dominance constraints induced by full random recourse. SIAM J. Optim. 2018, 28, 1396–1419. [Google Scholar] [CrossRef]
Sun, H.; Xu, H. Convergence analysis of stationary points in sample average approximation of stochastic programs with second order stochastic dominance constraints. Math. Program. 2014, 143, 31–59. [Google Scholar] [CrossRef]
Liu, Y.; Xu, H. Stability analysis of stochastic programs with second order dominance constraints. Math. Program. 2013, 142, 435–460. [Google Scholar] [CrossRef]
Dentcheva, D.; Ruszczyński, A. Optimization with stochastic dominance constraints. SIAM J. Optim. 2003, 14, 548–566. [Google Scholar] [CrossRef]
Shapiro, A.; Dentcheva, D.; Ruszczyński, A. Lectures on Stochastic Programming: Modeling and Theory; SIAM: Philadelphia, PA, USA, 2014. [Google Scholar]
Bertsimas, D.; Gupta, V.; Kallus, N. Data-driven robust optimization. Math. Program. 2018, 167, 235–292. [Google Scholar] [CrossRef] [Green Version]
Bertsimas, D.; Gupta, V.; Kallus, N. Robust sample average approximation. Math. Program. 2018, 171, 217–282. [Google Scholar] [CrossRef] [Green Version]
Delage, E.; Ye, Y. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 2010, 58, 595–612. [Google Scholar] [CrossRef] [Green Version]
Pichler, A.; Xu, H. Quantitative stability analysis for minimax distributionally robust risk optimization. Math. Program. 2022, 191, 47–77. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Xu, H.; Zhang, L. Quantitative stability analysis for distributionally robust optimization with moment constraints. SIAM J. Optim. 2016, 26, 1855–1882. [Google Scholar] [CrossRef]
Jiang, J.; Chen, Z. Quantitative stability analysis of two-stage stochastic linear programs with full random recourse. Numer. Funct. Anal. Optim. 2019, 40, 1847–1876. [Google Scholar] [CrossRef]
Varadarajan, V.S. On the convergence of sample probability distributions. Sankhyā Indian J. Stat. 1958, 19, 23–26. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Hu, H.; Jiang, J. Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization. Sustainability 2022, 14, 4501. https://doi.org/10.3390/su14084501

AMA Style

Chen Z, Hu H, Jiang J. Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization. Sustainability. 2022; 14(8):4501. https://doi.org/10.3390/su14084501

Chicago/Turabian Style

Chen, Zhiping, He Hu, and Jie Jiang. 2022. "Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization" Sustainability 14, no. 8: 4501. https://doi.org/10.3390/su14084501

APA Style

Chen, Z., Hu, H., & Jiang, J. (2022). Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization. Sustainability, 14(8), 4501. https://doi.org/10.3390/su14084501

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Convergence Analysis on Data-Driven Fortet-Mourier Metrics with Applications in Stochastic Optimization

Abstract

1. Introduction

2. Prerequisites

3. Convergence Analyses of Data-Driven FM Metrics

4. Applications

4.1. Two-Stage Stochastic Linear Programming Problems

4.2. Stochastic Optimization Problems with Stochastic Dominance Constraints

4.3. Data-Driven DRO Problems with FM Ball

4.4. Discrete Approximation for DRO Problems with General Moment Information

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI