Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems

Deng, Hua; Wan, Zhong

doi:10.3390/math14020331

Open AccessArticle

Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems

by

Hua Deng

^1,2

and

Zhong Wan

^3,*

¹

School of Mathematics and Statistics, Central South University, Changsha 410083, China

²

School of Mathematics and Finance, Hunan University of Humanities, Science and Technology, Loudi 417000, China

³

International Business School, Hunan University of Information Technology, Changsha 410151, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(2), 331; https://doi.org/10.3390/math14020331

Submission received: 8 December 2025 / Revised: 10 January 2026 / Accepted: 16 January 2026 / Published: 19 January 2026

(This article belongs to the Section D: Statistics and Operational Research)

Download

Browse Figures

Versions Notes

Abstract

Uncertainty in optimization models often causes awkward properties in their deterministic equivalent formulations (DEFs), even for simple linear models. Chance-constrained programming is a reasonable tool for handling optimization problems with random parameters in objective functions and constraints, but it assumes that the distribution of these random parameters is known, and its DEF is often associated with the complicated computation of multiple integrals, hence impeding its extensive applications. In this paper, for optimization models with chance constraints, the historical data of random model parameters are first exploited to construct an adaptive approximate density function by incorporating piecewise linear interpolation into the well-known histogram method, so as to remove the assumption of a known distribution. Then, in view of this estimation, a novel confidence set only involving finitely many variables is constructed to depict all the potential distributions for the random parameters, and a computable reformulation of data-driven distributionally robust chance constraints is proposed. By virtue of such a confidence set, it is proven that the deterministic equivalent constraints are reformulated as several ordinary constraints in line with the principles of the distributionally robust optimization approach, without the need to solve complicated semi-definite programming problems, compute multiple integrals, or solve additional auxiliary optimization problems, as done in existing works. The proposed method is further validated by the solution of the stochastic multiperiod capacitated lot-sizing problem, and the numerical results demonstrate that: (1) The proposed method can significantly reduce the computational time needed to find a robust optimal production strategy compared with similar ones in the literature; (2) The optimal production strategy provided by our method can maintain moderate conservatism, i.e., it has the ability to achieve a better trade-off between cost-effectiveness and robustness than existing methods.

Keywords:

lot-sizing problem; data-driven modeling; chance-constrained programming; scientific computing; distributionally robust optimization

MSC:

90B36; 68U01

1. Introduction

Chance-constrained optimization is a popular way to handle optimization models with random parameters, possessing significant advantages over the classical expectation-based approaches [1]. In this paper, we consider the following chance-constrained optimization problem:

\begin{matrix} min & C (x) \end{matrix}

(1a)

s . t . P {A (ξ) x \leq b (ξ)} \geq 1 - α,

(1b)

x \in X \subseteq R^{n},

(1c)

where

x \in R^{n}

is the decision vector,

C : R^{n} \to R

is a real-valued convex continuous objective function,

X \subset R^{n}

denotes a computable bounded closed convex set,

ξ

is a random model parameter causing uncertainty in the coefficient matrix

A (ξ) \in R^{m \times n}

and the capacity vector

b (ξ) \in R^{m}

,

P {\cdot}

is a probability operator of random events when the probability distributional function of

ξ

is

P

, and

α \in (0, 1)

represents a given risk level, i.e., the tolerance of constraint violation permitted by the decision-maker.

Owing to the practical difficulties in finding the true distribution of

ξ

, data-driven distributionally robust chance-constrained programming (DRCCP) has been developed, which replaces the single distribution in the chance constraint (1b) with a group of potential distributions in a confidence set

D

[2,3,4,5,6]. Specifically, by collecting sufficient observed historical data of

ξ

, the so-called confidence set

D

is first constructed to depict the domain of all approximate probability distributional functions of

ξ

. Then, in the realm of distributionally robust optimization, Problem (1) is approximated by the following DRCCP model:

\begin{matrix} min & C (x) \end{matrix}

(2a)

\begin{matrix} s . t . & inf_{P \in D} P {A (ξ) x \leq b (ξ)} \geq 1 - α, \end{matrix}

(2b)

x \in X \subseteq R^{n} .

(2c)

Clearly, in order to guarantee the applicability of the above data-driven DRCCP approach, it is crucial to address how to choose an appropriate confidence set, especially paying attention to the computational tractability of the constraint (2b) [7]. For example, by using the first and second moments of the data [2,3], moment-based confidence sets were defined. Moment-based sets, confined by estimates of mean and covariance, often lead to tractable reformulations but may overlook crucial distributional shapes, potentially resulting in overly conservative decisions. In order to incorporate higher-order distributional information of

ξ

, proper divergence-based confidence sets were constructed [6,8,9]. These confidence sets typically comprised all potential probability distributions within a specified tolerance of a reference distribution (e.g., the empirical distribution), where the tolerance is quantified by a statistical divergence such as the Kullback–Leibler (KL) divergence [10,11] or Wasserstein metric [6,12]. However, the reformulation of the constraint (2b) in existing works typically involves the solution of complicated semi-definite programming (SDP) problems based on the theory of functional extrema and duality [6]. For instance, Nguyen et al. derived an equivalent SDP reformulation of the problem via an analytical formula for the Wasserstein distance between two normal distributions [13,14]. In addition, the distributionally robust chance constraints can be reformulated into tractable mixed-integer programs (MIPs) [15,16], but solving this MIP reformulation is also difficult, even with the most advanced solvers [17]. For large-scale or complex instances, the use of data-driven methods can be beneficial for improvements in solution efficiency [18].

Motivated by the need to build a more appropriate confidence set from the collected data with a complex distribution, we intend to first construct an adaptive approximate density function by incorporating piecewise linear interpolation into the well-known histogram method, so as to remove the assumption of known distributions. Then, using this estimated distribution, we define a confidence set only involving finitely many variables, and, by the principles of distributionally robust optimization, we reformulate the original DRCCP into a computable standard optimization problem. We summarize the main contributions of this research as follows.

1. This is the first time that an adaptive probability density estimation is constructed by optimizing the bin widths in line with the distributional structure of the data, rather than using a uniform width as in the classic estimation method. In particular, for smaller fluctuations in the data, wider bin widths are chosen. For greater fluctuations in the data, smaller bin widths are used. An advantage of doing so is that it can reduce the number of used nodes when defining confidence sets and is beneficial for the dimensionality reduction of the derived robust optimization models.

2. Through variation distances, a confidence set to depict all the approximate probability density functions is constructed, which can strictly restrict the global difference between the confidence sets and the estimated discrete nominal distribution, avoiding the loss of high-order statistical information caused by moment constraints. Owing to the advantages of such a confidence set, it will be proven that the complicated chance constraint (2b) can be reformulated as an easily computable ordinary inequality constraint, which is beneficial for the development of efficient algorithms to solve the original problem (2).

3. The proposed reformulation technique is applied to solve the stochastic multiperiod capacitated lot-sizing problem (SMP-CLSP), and its advantages are validated, especially in comparison with similar ones in the literature.

The remainder of this paper is structured as follows. Section 2 introduces data-driven adaptive confidence sets with finitely many parameters, including the adaptive estimation of probability density functions and the construction of the corresponding confidence sets. Section 3 presents the reformulation of data-driven distributionally robust chance constraints. Section 4 provides numerical validation when applying the proposed computational technique to the SMP-CLSP. This research is concluded in Section 5.

2. Data-Driven Adaptive Confidence Sets with Finite Parameters

2.1. Adaptive Estimation of Probability Density Functions

Instead of fixing the bin widths when obtaining the reference distribution from the data, we first propose a numerical method to adaptively estimate the probability density function directly from the collected data by incorporating a piecewise linear interpolation technique into the well-known histogram method, where the distributional structure of the raw sample data is employed to optimize the choice of bin widths.

Let

Ξ = {ξ^{1}, ξ^{2}, \dots, ξ^{n_{s}}}

represent a sample set of size

n_{s}

, where all the samples are independently and identically distributed (i.i.d.) realizations of a random variable

ξ

, governed by an unknown probability density function f. We define the sample range by the smallest and largest realizations in the sample set, denoted as

ξ^{min} = {min}_{1 \leq j \leq n_{s}} ξ^{j}

and

ξ^{max} = {max}_{1 \leq j \leq n_{s}} ξ^{j}

, respectively. Like the classical histogram method [19], we partition

[ξ^{min}, ξ^{max}]

into N subintervals

[z_{i}, z_{i + 1})

, being referred to as N bins:

B_{1}, B_{2}, \dots, B_{N}

.

h_{i} = z_{i + 1} - z_{i}

is called the width of the i-th bin. For each bin

B_{i}

, let

v_{i}

denote the number of samples that fall into the bin

B_{i}

. By the Sturges’ number-of-bins rule, the initial number of bins is chosen to be

N = ⌈ {log}_{2} n_{s} + 1 ⌉

, where

⌈ \cdot ⌉

represents the upper rounding-off operation. Then, by the classical histogram method, the reference density function [20]

{\hat{f}}_{n} : R \to [0, + \infty)

is defined to be

{\hat{f}}_{n} (z) = \{\begin{matrix} \frac{v_{i}}{n_{s} h_{i}} = \frac{1}{n_{s} h_{i}} \sum_{j = 1}^{n_{s}} I_{[z_{i}, z_{i + 1})} (ξ^{j}), z \in B_{i}, i = 1, \dots, N - 1; \\ \frac{v_{N}}{n_{s} h_{N}} = \frac{1}{n_{s} h_{N}} \sum_{j = 1}^{n_{s}} I_{[z_{N}, z_{N + 1}]} (ξ^{j}), z \in B_{N}, \end{matrix}

(3)

where

I_{B_{i}} (z) = \{\begin{matrix} 1, & if z \in B_{i}; \\ 0, & Otherwise . \end{matrix}

Numerous studies have shown that the selection of the bin width plays a crucial role in accurately estimating the probability density [21]. In particular, for sample data with a complex distributional structure, the fixed bin width in the classical histogram method may greatly restrict the ability of the estimated density function to depict the true distribution [20]. In order to overcome this difficulty, we now present a numerical method to adaptively estimate the probability density function directly from the collected data by incorporating the piecewise linear interpolation technique into the well-known histogram method [19]. Specifically, we first define a new function

\hat{f} : [z_{1}, z_{N + 1}] \to [0, + \infty)

by

\hat{f} (z) = \{\begin{matrix} \frac{2 ({\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i}))}{h_{i} + h_{i + 1}} (z - η_{i}) + {\hat{f}}_{n} (η_{i}), & z_{i} \leq z < z_{i + 1}, i = 1, 2, \dots, N - 1; \\ \frac{h_{N} + η_{N} - z}{h_{N}} {\hat{f}}_{n} (η_{N}), & z_{N} \leq z \leq z_{N + 1}, \end{matrix}

(4)

where

η_{i} = \frac{z_{i} + z_{i + 1}}{2}

,

i = 1, 2, \dots, N

. Then, in order to ensure the nonnegativity of

\hat{f}

and achieve a better approximation, we compute

\{\begin{matrix} h_{i} = z_{i + 1} - z_{i}, i = 1, 2, \dots, N, \\ η_{i} = \frac{z_{i} + z_{i + 1}}{2}, i = 1, 2, \dots, N, \\ s l p_{i} (\hat{f}) = \frac{2 ({\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i}))}{h_{i} + h_{i + 1}}, i = 1, 2, \dots, N - 1, \\ \hat{f} [η_{i}, η_{i + 1}, η_{i + 2}] = \frac{4}{h_{i + 1} + h_{i}} || \frac{{\hat{f}}_{n} (η_{i + 2}) - {\hat{f}}_{n} (η_{i + 1})}{h_{i + 1} + h_{i + 2}} | - | \frac{{\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i})}{h_{i + 1} + h_{i}} ||, \\ i = 1, 2, \dots, N - 2 . \\ \hat{f} [η_{N - 1}, η_{N}, z_{N + 1}] = \frac{2}{h_{N - 1} + h_{N}} || \frac{{\hat{f}}_{n} (η_{N})}{h_{N}} | - | \frac{2 ({\hat{f}}_{n} (η_{N}) - {\hat{f}}_{n} (η_{N - 1}))}{h_{N} + h_{N - 1}} ||, \\ ▵_{i_{m a x}}^{2} = max \{\hat{f} [η_{i}, η_{i + 1}, η_{i + 2}], \hat{f} [η_{N - 1}, η_{N}, z_{N + 1}], i = 1, 2, \dots, N - 2 .\} . \end{matrix}

It can be seen that

\hat{f} [η_{i}, η_{i + 1}, η_{i + 2}]

and

▵_{i_{m a x}}^{2}

reflect the information of the second-order forward difference quotient and the maximum second-order difference quotient, respectively. With this first and second-order information, we further optimize the bin widths by inserting more nodes into the bins in line with the following rules.

(I): When $s l p_{i} (\hat{f}) > max \{0, \frac{2 {\hat{f}}_{n} (η_{i})}{h_{i}}\}$ , a new node $z_{i + 2}^{'} = \frac{z_{i + 1} + z_{i + 2}}{2}$ is inserted into the $(i + 1)$ -th bin. The original $(i + 1)$ -th bin $B_{i + 1}$ is split into two new sub-bins: $B_{i + 1}^{'} = [z_{i + 1}, z_{i + 2}^{'}), B_{i + 2}^{'} = [z_{i + 2}^{'}, z_{i + 2})$ . The midpoints of the new sub-bins $B_{i + 1}^{'}$ and $B_{i + 2}^{'}$ are $η_{i + 1}^{'} = \frac{z_{i + 1} + z_{i + 2}^{'}}{2}$ and $η_{i + 2}^{'} = \frac{z_{i + 2}^{'} + z_{i + 2}}{2}$ , respectively. Apply (3) to obtain the corresponding ${\hat{f}}_{n} (η_{i + 1}^{'})$ and ${\hat{f}}_{n} (η_{i + 2}^{'})$ for the two new sub-bins. Consequently, the updated midpoints are given by $(η_{i + 1}^{'}, {\hat{f}}_{n} (η_{i + 1}^{'}))$ and $(η_{i + 2}^{'}, {\hat{f}}_{n} (η_{i + 2}^{'}))$ , respectively. By rearranging the indices of all obtained bins, we insert more nodes till the inequality $s l p_{i} (\hat{f}) \leq \frac{2 {\hat{f}}_{n} (η_{i})}{h_{i}}$ holds.
(II): When $s l p_{i} (\hat{f}) < min \{0, \frac{- 2 {\hat{f}}_{n} (η_{i})}{h_{i}}\}$ , a new node $z_{i + 1}^{'} = \frac{z_{i} + z_{i + 1}}{2}$ is inserted into the i-th bin. Split $B_{i}$ into two sub-bins: $B_{i}^{'} = [z_{i}, z_{i + 1}^{'}), B_{i + 1}^{'} = [z_{i + 1}^{'}, z_{i + 1})$ . The midpoints of the new sub-bins $B_{i}^{'}$ and $B_{i + 1}^{'}$ are $η_{i}^{'} = \frac{z_{i} + z_{i + 1}^{'}}{2}$ and $η_{i + 1}^{'} = \frac{z_{i + 1}^{'} + z_{i + 1}}{2}$ , respectively. Apply (3) to obtain the corresponding ${\hat{f}}_{n} (η_{i}^{'})$ and ${\hat{f}}_{n} (η_{i + 1}^{'})$ for the two new sub-bins. Consequently, the updated midpoints are given by $(η_{i}^{'}, {\hat{f}}_{n} (η_{i}^{'}))$ and $(η_{i + 1}^{'}, {\hat{f}}_{n} (η_{i + 1}^{'}))$ , respectively. By rearranging the indices of all bins, we insert more nodes till $s l p_{i} (\hat{f}) \geq \frac{- 2 {\hat{f}}_{n} (η_{i})}{h_{i}}$ holds.
(III): Denote by $B_{i_{m a x}}$ , $B_{i_{m a x} + 1}$ , and $B_{i_{m a x} + 2}$ the three adjacent bins corresponding to this maximum second-order difference quotient, respectively. Let $T H$ be the given interpolation tolerance. If it is required that the maximum second-order difference quotient $▵_{i_{m a x}}^{2} \geq T H$ , then we select the bin with the maximum frequency in $B_{i_{m a x}}$ , $B_{i_{m a x} + 1}$ , and $B_{i_{m a x} + 2}$ , denoted by $B_{i_{m}}$ . Add its midpoint as the newly inserted interpolator, i.e., $z_{i_{m + 1}}^{'} = \frac{z_{i_{m}} + z_{i_{m + 1}}}{2}$ . Split $B_{i_{m}}$ into two sub-bins: $B_{i_{m}}^{'} = [z_{i_{m}}, z_{i_{m + 1}}^{'}), B_{i_{m + 1}}^{'} = [z_{i_{m + 1}}^{'}, z_{i_{m + 1}})$ . The midpoints of the new sub-bins $B_{i_{m}}^{'}$ and $B_{i_{m + 1}}^{'}$ are $η_{i_{m}}^{'} = \frac{z_{i_{m}} + z_{i_{m + 1}}^{'}}{2}$ and $η_{i_{m + 1}}^{'} = \frac{z_{i_{m + 1}}^{'} + z_{i_{m + 1}}}{2}$ , respectively. The updated midpoints are given by $(η_{i_{m}}^{'}, {\hat{f}}_{n} (η_{i_{m}}^{'}))$ and $(η_{i_{m + 1}}^{'}, {\hat{f}}_{n} (η_{i_{m + 1}}^{'}))$ , respectively. Consequently, among the three bins involved in the maximum second-order difference quotient, only the bin with a larger frequency needs to be further interpolated. With this new interpolator, we calculate the largest second-order difference quotient again and repeat the above interpolation until the inequality $▵_{i_{m a x}}^{2} < T H$ is satisfied.

The consecutive steps of the proposed approach to adaptively estimating the probability density based on Rules (I)–(III) are illustrated in Figure 1.

Remark 1.

As demonstrated intuitively in Figure 1, the above three rules aim to optimize the bin width by adaptively inserting new knots in each bin in line with the properties of

\hat{f}

desired. Essentially, bins with larger slope variations will be inserted into new knots. The rate of slope changes between these bins is characterized using the second-order difference quotient.

After applying Rules (I)–(III) to optimize the bin widths, the obtained estimation of the density function can depict the distributional nature of the sample data. Without loss of generality, the final node set is still denoted by

z_{f i n a l} = {z_{1}, z_{2}, \dots, z_{N + 1}}

, where N is the optimized number of bins. Consequently, by Rules (I), (II), and (III),

\hat{f}

in (4) is modified to be a piecewise linear function with variable steps, denoted by

{\hat{f}}_{n}

. We can prove that

{\hat{f}}_{n}

is an improved estimation of the true density function with adaptive bin widths, and the method is called improved density estimation with piecewise linear interpolation (IDE-PLI).

Proposition 1.

Let

\hat{f}

be defined by (4) and Rules (I), (II), and (III). Then, (1) it holds that

{\hat{f}}_{n} \geq 0

. (2)

\int_{z_{1}}^{z_{N + 1}} {\hat{f}}_{n} (z) d z = 1

.

Proof.

(1) In the case of

s l p_{i} (\hat{f}) \geq 0

, it follows from the first equality in (4) that

\begin{matrix} \hat{f} (z_{i}) = & \frac{2 ({\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i}))}{h_{i} + h_{i + 1}} (z_{i} - η_{i}) + {\hat{f}}_{n} (η_{i}) \\ = & - s l p_{i} (\hat{f} (z), γ) (η_{i} - z_{i}) + {\hat{f}}_{n} (η_{i}) \\ \geq & - \frac{2 {\hat{f}}_{n} (η_{i})}{h_{i}} (η_{i} - z_{i}) + {\hat{f}}_{n} (η_{i}) \\ = & 0, \end{matrix}

where

h_{i} = η_{i + 1} - η_{i}

,

h_{i + 1} = η_{i + 2} - η_{i + 1}

,

z_{i} \leq z < z_{i + 1}

. A similar proof can be obtained in the case of

s l p_{i} (\hat{f} (z)) < 0

.

(2) For the bin

[z_{i}, z_{i + 1})

,

i = 1, 2, \dots, N - 1

, we have

\begin{matrix} \int_{z_{i}}^{z_{i + 1}} \hat{f} (z) d z & = \int_{z_{i}}^{z_{i + 1}} (\frac{2 ({\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i}))}{h_{i} + h_{i + 1}} (z - η_{i})) d z + \int_{z_{i}}^{z_{i + 1}} {\hat{f}}_{n} (η_{i}) d z \\ = \frac{{\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i})}{h_{i} + h_{i + 1}} ({(z_{i + 1} - η_{i})}^{2} - {(z_{i} - η_{i})}^{2}) + {\hat{f}}_{n} (η_{i}) (z_{i + 1} - z_{i}) \\ = \frac{{\hat{f}}_{n} (η_{i + 1}) - {\hat{f}}_{n} (η_{i})}{h_{i} + h_{i + 1}} h_{i} (z_{i + 1} + z_{i} - 2 η_{i}) + {\hat{f}}_{n} (η_{i}) h_{i} \\ = {\hat{f}}_{n} (η_{i}) h_{i} . \end{matrix}

For the bin

[z_{N}, z_{N + 1}]

, we obtain

\int_{z_{N}}^{z_{N + 1}} \hat{f} (z) d z = \int_{z_{N}}^{z_{N + 1}} (\frac{h_{N} + η_{N} - z}{h_{N}} {\hat{f}}_{n} (η_{N})) d z = {\hat{f}}_{N} (η_{N}) h_{N} .

By the classical histogram method, we know that

\sum_{i = 1}^{N} {\hat{f}}_{n} (η_{i}) h_{i} = 1

. Thus,

\int_{z_{1}}^{z_{N + 1}} \hat{f} (z) d z = 1 .

The proof is completed. □

Remark 2.

By Proposition 1, one knows that

{\hat{f}}_{n}

can also be viewed as an estimation of the true density function, but its adaptive bin widths enable it to clearly depict the distributional nature of the sample data, as compared with the fixed bin width in the classical frequency histogram method.

2.2. Construction of Confidence Sets Only with Finitely Many Parameters

Using the obtained

{\hat{f}}_{n} (z)

with

N + 1

nodes

z_{1}, z_{2}, \dots, z_{N + 1}

, we calculate the probability mass

\hat{P}

of

{\hat{f}}_{n} (z)

on each bin

B_{i}

{\hat{p}}_{i} (z_{i}, z_{i + 1}) = \hat{P} (ξ \in [z_{i}, z_{i + 1})) = \int_{z_{i}}^{z_{i + 1}} {\hat{f}}_{n} (z) d z, i = 1, \dots, N .

(5)

With

{\hat{p}}_{i} (z_{i}, z_{i + 1})

,

i = 1, \dots, N

, we define a confidence set for the random variable

ξ

and a given divergence tolerance

γ > 0

by

D = \{P |\begin{matrix} p_{i} = P {ξ = η_{i} = \frac{z_{i} + z_{i + 1}}{2}}, & i = 1, 2, \dots, N, \\ \sum_{i = 1}^{N} p_{i} = 1, \\ \sum_{i = 1}^{N} | p_{i} - {\hat{p}}_{i} (z_{i}, z_{i + 1}) | \leq γ, \\ 0 \leq p_{i} \leq 1, & i = 1, 2, \dots, N . \end{matrix}\} .

(6)

Clearly, the divergence tolerance

γ > 0

can control the size of the confidence set

D

. Notably, different from the existing confidence sets in the literature, the set

D

in (6) only involves finitely many parameters, rather than a space of probability density functions.

Theorem 1.

Let ξ be a random variable with the distribution

P

and probability density f. Partition the domain of ξ into N disjoint bins

{B_{i}}_{i = 1}^{N}

. The sample size is denoted by

n_{s}

. Let

{\hat{p}}_{i} (z_{i}, z_{i + 1})

be defined by (5). Let

P_{n_{s}}

denote the sampling distribution induced by samples of size

n_{s}

. Then, for any confidence level

β \in (0, 1)

, it holds that

P_{n_{s}} (\sum_{i = 1}^{N} | p_{i} - {\hat{p}}_{i} (z_{i}, z_{i + 1}) | \leq \sqrt{\frac{N^{2} ln (2 N / β)}{2 n_{s}}}) \geq 1 - β .

(7)

Proof.

By (5), for each bin i,

{\hat{p}}_{i} (z_{i}, z_{i + 1}) = \int_{B_{i}} \frac{v_{i}}{n_{s} h_{i}} d z = \frac{v_{i}}{n_{s}} = \frac{1}{n_{s}} \sum_{j = 1}^{n_{s}} I_{B_{i}} (ξ^{j}), i = 1, \dots, N, j = 1, \dots, n_{s},

where

I_{B_{i}} (ξ^{j})

is the Bernoulli variable with

0 \leq I_{B_{i}} (ξ^{j}) \leq 1

. Due to sample independence, the indicators

I_{B_{i}} (ξ^{j})

are independent across j, satisfying the independence and boundedness conditions required for Hoeffding’s inequality [22]. For any

ϵ > 0

,

P_{n_{s}} (| {\hat{p}}_{i} (z_{i}, z_{i + 1}) - p_{i} | \geq ϵ) \leq 2 exp (- 2 n_{s} ϵ^{2}) .

(8)

To control the total variation divergence across all N bins, we use the union bound to the events

{| {\hat{p}}_{i} (z_{i}, z_{i + 1}) - p_{i} | \geq ϵ}

, i.e.,

P_{n_{s}} (⋃_{i = 1}^{N} {| {\hat{p}}_{i} (z_{i}, z_{i + 1}) - p_{i} | \geq ϵ}) \leq \sum_{i = 1}^{N} P_{n_{s}} (| {\hat{p}}_{i} (z_{i}, z_{i + 1}) - p_{i} | \geq ϵ) .

(9)

Substituting the Hoeffding bound from (8) into (9), we get

P_{n_{s}} (⋃_{i = 1}^{N} \{| {\hat{p}}_{i} (z_{i}, z_{i + 1}) - p_{i} | \geq ϵ\}) \leq 2 N exp (- 2 n_{s} ϵ^{2}) .

For a given confidence level

β \in (0, 1)

, the inequality

2 N exp (- 2 n_{s} ϵ^{2}) \leq β

yields

ϵ \geq \sqrt{\frac{ln (2 N / β)}{2 n_{s}}} .

Consequently, the desired inequality (7) holds. □

Remark 3.

By Theorem 1, we establish a rigorous statistical guarantee for the confidence set

D

: the total deviation between the estimated probabilities

{\hat{p}}_{i}

(from adaptive bandwidth density estimation) and the true probabilities

p_{i}

is bounded by

γ_{β, n_{s}} = \sqrt{\frac{N^{2} ln (2 N / β)}{2 n_{s}}}

with confidence level

1 - β

.

Notably,

D

is constructed based on direct deviations of probability mass functions, rather than the traditional moment information (e.g., mean or covariance), making it inherently suitable for scenarios where higher-order distributional properties are unknown or difficult to estimate.

By Theorem 1, we further prove the following result, which gives a distributionally robust guarantee for the chance constraint

P {A (ξ) x \leq b (ξ)} \geq 1 - α

.

Theorem 2.

Let ξ be a random variable with unknown distribution

P

. Suppose that the domain of ξ is partitioned into N disjoint bins

{B_{i}}_{i = 1}^{N}

. Let

n_{s}

be the number of independent samples used to construct the reference distribution

\hat{P}

, with

{\hat{p}}_{i}

denoting the reference probability of bin

B_{i}

. Define the confidence set

D

defined in (6), where

γ = \sqrt{\frac{N^{2} ln (2 N / β)}{2 n_{s}}}

for some

β \in (0, 1)

. For the chance constraint

P {A (ξ) x \leq b (ξ)} \geq 1 - α

, let

\hat{V} (x) = \hat{P} {A (ξ) x > b (ξ)}

be the reference violation probability. Then, for any

α, β \in (0, 1)

, when

\hat{V} (x) \leq α - \frac{γ}{2},

it holds with probability at least

1 - β

that

{inf}_{P \in D} P {A (ξ) x \leq b (ξ)} \geq 1 - α

.

Moreover, the required sample size

n_{s}

to ensure

\hat{V} (x) \leq α - \frac{γ}{2}

satisfies

n_{s} \geq \frac{N^{2} ln (2 N / β)}{8 {(α - \hat{V} (x))}^{2}} .

(10)

Proof.

By Theorem 1, with probability at least

1 - β

, we have

\sum_{i = 1}^{N} | p_{i} - {\hat{p}}_{i} | \leq γ

. The total variation distance [23] between

P

and

\hat{P}

is

δ (P, \hat{P}) = \frac{1}{2} \sum_{i = 1}^{N} | p_{i} - {\hat{p}}_{i} | \leq \frac{γ}{2} .

Thus, for any random event E, we know that

| P (E) - \hat{P} (E) | \leq δ (P, \hat{P}) \leq \frac{γ}{2} .

In particular, taking

E = {A (ξ) x > b (ξ)}

, we get

P (E) \leq \hat{P} (E) + \frac{γ}{2} = \hat{V} (x) + \frac{γ}{2} .

Consequently,

P {A (ξ) x \leq b (ξ)} = 1 - P (E) \geq 1 - \hat{V} (x) - \frac{γ}{2} .

In the case that

\hat{V} (x) \leq α - \frac{γ}{2}

, it is concluded that

P {A (ξ) x \leq b (ξ)} \geq 1 - α

. Since this inequality holds for all

P \in D

, it follows that

inf_{P \in D} P {A (ξ) x \leq b (ξ)} \geq 1 - α .

Since

\hat{V} (x) \leq α - \frac{γ}{2},

it follows from

γ = \sqrt{\frac{N^{2} ln (2 N / β)}{2 n_{s}}}

that

n_{s} \geq \frac{N^{2} ln (2 N / β)}{8 {(α - \hat{V} (x))}^{2}} .

This completes the proof. □

Remark 4.

In Theorem 2, a sample size condition (10) is provided such that the distributionally robust chance constraint holds with high probability, without requiring knowledge of the mean or covariance. Note that this condition depends on the reference violation probability

\hat{V} (x)

, which can be directly estimated from the collected data.

3. Reformulation of Data-Driven Distributionally Robust Chance Constraints

3.1. Reformulation of Distributionally Robust Chance Constraints with a Random Variable

As done in [8], it is supposed that all the model parameters in (2b) are linearly correlated with the random parameter

ξ

, which yields

A (ξ) = A^{0} + A ξ, b (ξ) = b^{0} + b ξ .

Thus, for each

k \in K = {1, 2, \dots, m}

, it holds that

A (ξ) x \leq b (ξ) \Leftrightarrow (A x - b) ξ \leq b^{0} - A^{0} x \Leftrightarrow (A_{k} x - b_{k}) ξ \leq b_{k}^{0} - A_{k}^{0} x,

where the k-th row of the matrix A is denoted by

A_{k}

.

Remark 5.

When

A_{k} x - b_{k} = 0

, the inequality degenerates to

0 \leq b_{k}^{0} - A_{k}^{0} x

, which is a deterministic constraint independent of ξ. When ξ is a continuous random parameter, the degenerate case

A_{k} x - b_{k} = 0

can be ignored in the analysis of the chance constraint.

For any given x, define

c_{k} (x) = \{z \in R |\begin{matrix} z \leq T o p_{k}^{+} (x) = \frac{b_{k}^{0} - A_{k}^{0} x}{A_{k} x - b_{k}}, & if A_{k} x - b_{k} > 0, \\ z \geq T o p_{k}^{-} (x) = \frac{b_{k}^{0} - A_{k}^{0} x}{A_{k} x - b_{k}}, & if A_{k} x - b_{k} < 0 . \end{matrix}\}

Let

C (x) = ⋂_{k = 1}^{m} c_{k} (x)

. Then, we can define the worst-case probability bound by

\begin{matrix} z_{D} & = inf_{P \in D} P {A (ξ) x \leq b (ξ)} = inf_{P \in D} P \{ξ \in C (x)\} . \end{matrix}

(11)

Denote

\begin{matrix} S^{+} = {k \in K : A_{k} x - b_{k} > 0}, U (x) = min_{k \in S^{+}} T o p_{k}^{+} (x), \\ S^{-} = {k \in K : A_{k} x - b_{k} < 0}, L (x) = max_{k \in S^{-}} T o p_{k}^{-} (x) . \end{matrix}

(12)

We now prove that

z_{D}

in (11) has the following property.

Proposition 2.

Let

S^{+}

,

S^{-}

,

L (x)

, and

U (x)

be defined by (12). The worst-case probability bound

z_{D}

in (11) is specified by one of the following three formulas.

(1) When $S^{-} = Ø$ ,

z_{D} = \{\begin{matrix} 0, & if U (x) < z_{1}, \\ inf_{P \in D} \sum_{i = 1}^{j} p_{i}, & if z_{j} \leq U (x) < z_{j + 1}, j = 1, 2, \dots, N, \\ 1, & if U (x) \geq z_{N} . \end{matrix}

(13)

(2) When $S^{+} = Ø$ ,

z_{D} = \{\begin{matrix} 1, & if L (x) < z_{1}, \\ inf_{P \in D} \sum_{i = j}^{N} p_{i}, & if z_{j} \leq L (x) < z_{j + 1}, j = 1, 2, \dots, N, \\ 0, & if L (x) \geq z_{N + 1} . \end{matrix}

(14)

(3) When $S^{+} \neq Ø$ and $S^{-} \neq Ø$ ,

\begin{matrix} z_{D} = \{\begin{matrix} 0, & if L (x) > U (x), or U (x) < z_{1}, or L (x) \geq z_{N + 1}, \\ inf_{P \in D} \sum_{i = j}^{j^{'}} p_{i}, & if L (x) \in [z_{j}, z_{j + 1}), U (x) \in [z_{j^{'}}, z_{j^{'} + 1}) with j, j^{'} \in {1, 2, \dots, N}, \\ 1, & if L (x) \leq z_{1} and U (x) \geq z_{N + 1} . \end{matrix} \end{matrix}

Proof.

The chance constraints

P \{ξ \in C (x)\}

in (11) reduce to three distinct cases.

Case 1:

S^{-} = Ø

. Then,

P \{ξ \in C (x)\} = P (ξ \leq min_{k \in K} T o p_{k}^{+} (x))

, matching the original formulation when all

A_{k} x - b_{k} > 0

. From (12), we know that

P (ξ \in C (x)) = P (ξ \leq U (x)) = \{\begin{matrix} 0, & if U (x) < z_{1}, \\ \sum_{i = 1}^{j} p_{i}, & if z_{j} \leq U (x) < z_{j + 1}, j = 1, 2, \dots, N, \\ 1, & if U (x) \geq z_{N} . \end{matrix}

Thus,

z_{D} = \{\begin{matrix} 0, & if U (x) < z_{1}, \\ inf_{P \in D} \sum_{i = 1}^{j} p_{i}, & if z_{j} \leq U (x) < z_{j + 1}, j = 1, 2, \dots, N, \\ 1, & if U (x) \geq z_{N} . \end{matrix}

Case 2:

S^{+} = Ø

. Then,

P \{ξ \in C (x)\} = P (ξ \geq max_{k \in K} T o p_{k}^{-} (x))

, matching the original formulation when all

A_{k} x - b_{k} < 0

. By (12), we get

P (ξ \in C (x)) = P (ξ \geq L (x)) = \{\begin{matrix} 1, & if L (x) < z_{1}, \\ \sum_{i = j}^{N} p_{i}, & if z_{j} \leq L (x) < z_{j + 1}, j = 1, 2, \dots, N, \\ 0, & if L (x) \geq z_{N + 1} . \end{matrix}

Then,

z_{D} = \{\begin{matrix} 1, & if L (x) < z_{1}, \\ inf_{P \in D} \sum_{i = j}^{N} p_{i}, & if z_{j} \leq L (x) < z_{j + 1}, j = 1, 2, \dots, N, \\ 0, & if L (x) \geq z_{N + 1} . \end{matrix}

Case 3:

S^{+} \neq Ø

and

S^{-} \neq Ø

. Then,

P \{ξ \in C (x)\} = P (max_{k \in S^{-}} T o p_{k}^{-} (x) \leq ξ \leq min_{k \in S^{+}} T o p_{k}^{+} (x)),

which is nonzero only if

max_{k \in S^{-}} T o p_{k}^{-} (x) \leq min_{k \in S^{+}} T o p_{k}^{+} (x)

. Conversely, if

max_{k \in S^{-}} T o p_{k}^{-} (x) > min_{k \in S^{+}} T o p_{k}^{+} (x)

, then

P (ξ \in C (x)) = 0

.

From (12), it follows that the probability

P \{ξ \in C (x)\}

is expressed as follows:

\begin{matrix} P (ξ \in C (x)) = P (L (x) \leq ξ \leq U (x)) \\ = & \{\begin{matrix} 0, & if L (x) > U (x), or U (x) < z_{1}, or L (x) \geq z_{N + 1}, \\ \sum_{i = j}^{j^{'}} p_{i}, & if L (x) \in [z_{j}, z_{j + 1}), U (x) \in [z_{j^{'}}, z_{j^{'} + 1}) with j, j^{'} \in {1, 2, \dots, N}, \\ 1, & if L (x) \leq z_{1} and U (x) \geq z_{N + 1} . \end{matrix} \end{matrix}

Then,

\begin{matrix} z_{D} = \{\begin{matrix} 0, & if L (x) > U (x), or U (x) < z_{1}, or L (x) \geq z_{N + 1}, \\ inf_{P \in D} \sum_{i = j}^{j^{'}} p_{i}, & if L (x) \in [z_{j}, z_{j + 1}), U (x) \in [z_{j^{'}}, z_{j^{'} + 1}), with j, j^{'} \in {1, 2, \dots, N}, \\ 1, & if L (x) \leq z_{1} and U (x) \geq z_{N + 1} . \end{matrix} \end{matrix}

With the above argument, we have completed the proof of the desired result.□

By Proposition 2, we now prove that the chance constraints (2b) can be reformulated as a number of ordinary ones.

Theorem 3.

Let A and b be a given matrix and a vector. Let

\hat{P}

be the estimated distribution obtained through the proposed adaptive data-driven method. Let γ be a given divergence tolerance of the ambiguous density functions in the confidence set

D

defined by (6), satisfying

2 α \geq γ > 0

. Then, the feasible set defined by the chance constraint, i.e.,

F_{c c} = \{x \in R^{n} : inf_{P \in D} P {A (ξ) x \leq b (ξ)} \geq 1 - α\}

, can be reformulated in the following forms.

1. The set

F_{c c} = Ø

if one of the following conditions holds:

(i) For all

k \in K

,

A_{k} x - b_{k} \geq 0

,

U (x) \leq z_{1}

;

(ii) For all

k \in K

,

A_{k} x - b_{k} \leq 0

,

L (x) \geq z_{N + 1}

;

(iii) There exist

k_{1}

,

k_{2} \in K

such that

A_{k_{1}} x - b_{k_{1}} \geq 0

,

A_{k_{2}} x - b_{k_{2}} \leq 0

,

L (x) \geq U (x)

;

(iv) There exist

k_{1}

,

k_{2} \in K

such that

A_{k_{1}} x - b_{k_{1}} \geq 0

,

A_{k_{2}} x - b_{k_{2}} \leq 0

,

U (x) \leq z_{1}

;

(v) There exist

k_{1}

,

k_{2} \in K

such that

A_{k_{1}} x - b_{k_{1}} \geq 0

,

A_{k_{2}} x - b_{k_{2}} \leq 0

,

L (x) \geq z_{N + 1}

.

2. The set

F_{c c} = C_{1}^{1} \cup C_{2}^{1} \cup C_{3}^{1}

, where

\begin{matrix} C_{1}^{1} = \{x \in R^{n} : \forall k \in K, A_{k} x - b_{k} \geq 0, U (x) \geq z_{N}\}, \\ C_{2}^{1} = \{x \in R^{n} : \forall k \in K, A_{k} x - b_{k} \leq 0, L (x) \leq z_{1}\}, \\ C_{3}^{1} = \{x \in R^{n} : \exists k_{1}, k_{2} \in K, A_{k_{1}} x - b_{k_{1}} \geq 0, A_{k_{2}} x - b_{k_{2}} \leq 0, L (x) \leq z_{1}\} \\ ⋃ \{x \in R^{n} : \exists k_{1}, k_{2} \in K, A_{k_{1}} x - b_{k_{1}} \geq 0, A_{k_{2}} x - b_{k_{2}} \leq 0, U (x) \geq z_{N + 1}\} \end{matrix}

3. The set

F_{c c} = C_{1} \cup C_{2} \cup C_{3}

, where

\begin{matrix} C_{1} = \{x \in R^{n} : \sum_{i = 1}^{j} {\hat{p}}_{i} (z_{i}, z_{i + 1}) \geq 1 - α + \frac{γ}{2}, A_{k} x - b_{k} \geq 0, z_{j} \leq U (x) < z_{j + 1}, \forall k, j\}, \\ C_{2} = \{x \in R^{n} : \sum_{i = j}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) \geq 1 - α + \frac{γ}{2}, A_{k} x - b_{k} \leq 0, z_{j} \leq L (x) \leq z_{j + 1}, \forall k, j\}, \\ C_{3} = \{x \in R^{n} : \sum_{i = j}^{j^{'}} {\hat{p}}_{i} (z_{i}, z_{i + 1}) \geq 1 - α + \frac{γ}{2}, \exists k_{1}, k_{2} \in K, A_{k_{1}} x - b_{k_{1}} \geq 0, A_{k_{2}} x - b_{k_{2}} \leq 0, \\ L (x) \in [z_{j}, z_{j + 1}), U (x) \in [z_{j^{'}}, z_{j^{'} + 1}), with j, j^{'} \in {1, 2, \dots, N}\} . \end{matrix}

Proof.

We prove the result in the following three cases.

Case 1:

S^{-} = Ø

, i.e.,

A_{k} x - b_{k} > 0

for all

k \in K

.

(i) When

U (x) < z_{1}

, from (13), it is deduced that

z_{D} = 0

. Consequently, the inequality

z_{D} \geq 1 - α

does not hold.

(ii) When

U (x) \geq z_{N}

, it follows from (13) that

z_{D} = 1

, and thus

z_{D} \geq 1 - α

always holds.

(iii) When

z_{j} \leq U (x) < z_{j + 1}

,

j = 1, 2, \dots, N

, the confidence set

D

is defined by (6). By Theorem 1, for the given x and the set

C (x)

, the worst-case probability bound (11) is transformed into the following programming problem:

\begin{matrix} [Primal] z_{D} = & inf_{P \in D} P {ξ \in C (x)} \end{matrix}

(15a)

\begin{matrix} \Leftrightarrow & min_{P} P {ξ \in C (x)} \end{matrix}

(15b)

\begin{matrix} s . t . \sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) |1 - \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})}| \leq γ, \end{matrix}

(15c)

\begin{matrix} \sum_{i = 1}^{N} p_{i} = 1, \end{matrix}

(15d)

\begin{matrix} p_{i} \geq 0, i = 1, 2, \dots, N, \end{matrix}

(15e)

where the constraint (15c) bounds the divergence from above by

γ

, and constraints (15d) and (15e) guarantee that

P

is a density function. This transformation is justified by the structural properties of the confidence set

D

, which collectively constrain the feasible region of

P

. Consequently, we get

\begin{matrix} z_{D} = & min_{P} \sum_{i = 1}^{j} p_{i} \\ s . t . \sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) |1 - \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})}| \leq γ, \\ \sum_{i = 1}^{N} p_{i} = 1, \\ p_{i} \geq 0, i = 1, 2, \dots, N . \end{matrix}

(16)

For the constraints in Problem (16), let

λ^{e q}

be the Lagrange multiplier corresponding to the equality constraint. Let

λ^{n e}

be the Lagrange multiplier corresponding to the inequality constraint. Then, its Lagrangian function reads

\begin{matrix} L (P, λ^{e q}, λ^{n e}) \\ = max_{λ^{e q}, λ^{n e} \geq 0} min_{P \geq 0} \{\sum_{i = 1}^{j} p_{i} + λ^{e q} (1 - \sum_{i = 1}^{N} p_{i}) + λ^{n e} (\sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) |1 - \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})}| - γ)\} \\ = max_{λ^{e q}, λ^{n e} \geq 0} \{λ^{e q} - λ^{n e} γ\} + \\ min_{P \geq 0} (\sum_{i = 1}^{N} (I_{C (x)} (z) - λ^{e q}) p_{i} + λ^{n e} \sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) |1 - \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})}|)\} \\ = max_{λ^{e q}, λ^{n e} \geq 0} \{λ^{e q} - λ^{n e} γ\} - \\ max_{P \geq 0} (\sum_{i = 1}^{N} (λ^{e q} - I_{C (x)} (z)) p_{i} - λ^{n e} \sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) |1 - \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})}|)\} \\ = max_{λ^{e q}, λ^{n e} \geq 0} \{λ^{e q} - λ^{n e} γ - \\ max_{P \geq 0} λ^{n e} \sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) (\frac{λ^{e q} - I_{C (x)} (z)}{λ^{n e}} \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})} - |1 - \frac{p_{i}}{{\hat{p}}_{i} (z_{i}, z_{i + 1})}|)\}, \end{matrix}

where

I_{C (x)} (z)

is the indicator function, which takes the value of 1 when

z \in C (x)

; otherwise, it takes 0. Note that the conjugate function of

φ (t) = | t - 1 |

is

φ^{*} (s) = sup_{t \geq 0} \{s t - φ (t)\} = \{\begin{matrix} - 1, & if s < - 1, \\ s, & if - 1 \leq s \leq 1, \\ + \infty, & if s > 1 . \end{matrix}

(17)

Then,

\begin{matrix} z_{D}^{'} = & max_{λ^{e q}, λ^{n e} > 0} \{λ^{e q} - λ^{n e} γ - λ^{n e} \sum_{i = 1}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}) φ^{*} (\frac{λ^{e q} - I_{C (x)} (z)}{λ^{n e}})\} \\ = & max_{λ^{e q}, λ^{n e} > 0} \{λ^{e q} - λ^{n e} γ - \\ λ^{n e} \sum_{i : I_{C (x)} = 1} {\hat{p}}_{i} (z_{i}, z_{i + 1}) φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}}) - λ^{n e} \sum_{i : I_{C (x)} = 0} {\hat{p}}_{i} (z_{i}, z_{i + 1}) φ^{*} (\frac{λ^{e q}}{λ^{n e}})\} \\ = & max_{λ^{e q}, λ^{n e} > 0} \{λ^{e q} - λ^{n e} γ - λ^{n e} {\hat{z}}_{D} φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}}) - λ^{n e} (1 - {\hat{z}}_{D}) φ^{*} (\frac{λ^{e q}}{λ^{n e}})\}, \end{matrix}

(18)

where

{\hat{z}}_{D} = \sum_{i : I_{C (x)} = 1} {\hat{p}}_{i} (z_{i}, z_{i + 1}) = \hat{P} (ξ \in C (x))

.

By the strong duality in the theory of linear programming, it holds that

z_{D} = z_{D}^{'}

for Problems (15) and (18). Thus, there exist constants

λ^{e q} > 0

and

λ^{n e} > 0

such that

z_{D}^{'} \geq 1 - α

, i.e.,

\begin{matrix} λ^{e q} - λ^{n e} γ - λ^{n e} {\hat{z}}_{D} φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}}) - λ^{n e} (1 - {\hat{z}}_{D}) φ^{*} (\frac{λ^{e q}}{λ^{n e}}) \geq 1 - α \end{matrix}

(19a)

\Leftrightarrow {\hat{z}}_{D} \geq \frac{1 - α - λ^{e q} + λ^{n e} γ + λ^{n e} φ^{*} (\frac{λ^{e q}}{λ^{n e}})}{λ^{n e} (φ^{*} (\frac{λ^{e q}}{λ^{n e}}) - φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}}))}

(19b)

\begin{matrix} = 1 - \frac{α + λ^{e q} - 1 - λ^{n e} γ - λ^{n e} φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}})}{λ^{n e} (φ^{*} (\frac{λ^{e q}}{λ^{n e}}) - φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}}))} . \end{matrix}

(19c)

Denote

λ_{1} = \frac{λ^{e q}}{λ^{n e}} \geq 0

and

λ_{0} = \frac{λ^{e q} - 1}{λ^{n e}}

. Then,

λ_{0} = λ_{1} - \frac{1}{λ^{n e}} \leq λ_{1}

.

We first prove that

λ_{1} > 1

is invalid. Indeed, from (17) and

λ^{n e} > 0

, it is inferred that when

λ_{1} > 1

, the left side of (19a) tends towards negative infinity, violating DCC. Therefore,

0 \leq λ_{1} \leq 1

and

λ_{0} \leq 1

.

Because

φ^{*}

is a conjugate function, from the properties of conjugate functions, it follows that (19c) is equivalent to

{\hat{z}}_{D} \geq 1 - sup_{λ_{0} \leq 1, 0 \leq λ_{1} \leq 1} \frac{α + λ^{e q} - 1 - λ^{n e} γ - λ^{n e} φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}})}{λ^{n e} (φ^{*} (\frac{λ^{e q}}{λ^{n e}}) - φ^{*} (\frac{λ^{e q} - 1}{λ^{n e}}))} .

(20)

We now conduct a case-by-case analysis of

λ_{0}

as follows.

When

- 1 \leq λ_{0} \leq 1

, the right-hand side of inequality (20) is transformed into

1 - sup_{λ_{0} \leq 1, 0 \leq λ_{1} \leq 1} \{α - λ^{n e} γ\} = 1 - α + \frac{γ}{2} .

(21)

Indeed,

\frac{1}{λ^{n e}} - 1 \leq λ_{1} \leq 1 + \frac{1}{λ^{n e}}

and

λ_{1} \leq 1

. Thus,

λ^{n e} \geq \frac{1}{2}

.

When

λ_{0} < - 1

, the right-hand side of inequality (20) is rewritten as

1 - sup_{λ_{0} \leq 1, 0 \leq λ_{1} \leq 1} \frac{α - 1 + λ^{n e} (λ_{1} - γ + 1)}{λ^{n e} (λ_{1} + 1)} = 1 - sup_{λ_{0} \leq 1, 0 \leq λ_{1} \leq 1} (\frac{α - 1}{λ^{n e} (λ_{1} + 1)} + \frac{λ_{1} - γ + 1}{λ_{1} + 1}) .

(22)

Since

0 \leq λ_{1} \leq 1

and

0 \leq α \leq 1

, when

λ^{n e} \to 0^{+}

, the optimization problem in (22) tends towards positive infinity—that is, there is no solution. Therefore, from (20) and (21), we get

{\hat{z}}_{D} \geq 1 - α + \frac{γ}{2} .

Furthermore, by (13), we have

\begin{matrix} {\hat{z}}_{D} = \sum_{i = 1}^{j} {\hat{p}}_{i} (z_{i}, z_{i + 1}), if S^{-} = Ø or z_{j} \leq U (x) < z_{j + 1}, j = 1, 2, \dots, N . \end{matrix}

A similar proof can be obtained for Cases 2 and 3.

Case 2:

S^{+} = Ø

, which means that

A_{k} x - b_{k} < 0

for all

k \in K

.

(i) When

L (x) \geq z_{N + 1}

, this implies that

z_{D} = 0

.

(ii) When

L (x) < z_{1}

, this leads to

z_{D} = 1

. As a result, the chance constraints are always satisfied.

(iii) When

z_{j} \leq L (x) < z_{j + 1}, j = 1, 2, \dots, N

, according to (14), it yields

\begin{matrix} {\hat{z}}_{D} = \sum_{i = j}^{N} {\hat{p}}_{i} (z_{i}, z_{i + 1}), if S^{+} = Ø or z_{j} \leq L (x) < z_{j + 1}, j = 1, 2, \dots, N . \end{matrix}

Case 3:

S^{+} \neq Ø

and

S^{-} \neq Ø

—that is,

\exists k_{1}, k_{2} \in K

, with

A_{k_{1}} x - b_{k_{1}} > 0

and

A_{k_{2}} x - b_{k_{2}} < 0

.

(i) When

L (x) > U (x)

, or

U (x) < z_{1}

, or

L (x) \geq z_{N + 1}

, these lead to

z_{D} = 0

.

(ii) When

L (x) \leq z_{1} and U (x) \geq z_{N + 1}

, then

z_{D} = 1

. Thus, the chance constraints are always satisfied.

(iii) When

L (x) \in [z_{j}, z_{j + 1}), U (x) \in [z_{j^{'}}, z_{j^{'} + 1})

with

j, j^{'} \in {1, 2, \dots, N}

, it yields

\begin{matrix} {\hat{z}}_{D} = \sum_{i = j}^{j^{'}} {\hat{p}}_{i} (z_{i}, z_{i + 1}), & where S^{+} = Ø or S^{-} = Ø or \\ (L (x) \in [z_{j}, z_{j + 1}), U (x) \in [z_{j^{'}}, z_{j^{'} + 1}), with j, j^{'} \in {1, \dots, N}) . \end{matrix}

The proof has been completed. □

Remark 6.

By Theorem 3, constraint (2b) is reformulated by a number of ordinary inequality constraints. Notably, different from the existing results, the derived deterministic equivalent formulation (DEF) of Model (1) is a linear programming model. It does not involve solving any complicated auxiliary optimization problems [8], a semi-definite programming problem [3], or a mixed-integer second-order cone programming problem.

3.2. Extension to Multistage Chance-Constrained Programming

We further extend the results in Proposition (2) and Theorem (3) to multistage DRCCPs.

Different from Model (2), a multistage DRCCP can be given by

\begin{matrix} min & C (x) \\ s . t . & inf_{P \in D} P {A (ξ) x \leq b (ξ)} \geq 1 - α, \\ x \in X \subseteq R^{n}, \end{matrix}

(23)

where

ξ = (ξ_{1}, \dots, ξ_{T})

is a random vector, T is the number of stages,

ξ_{t}

represents the random parameter at the t-th stage,

A (ξ) \in R^{m \times n}

is a random coefficient matrix, and

b (ξ) \in R^{m}

is a random capacity vector. Suppose that

ξ_{1}, \dots, ξ_{T}

are assumed to be mutually independent. Thus, the constraint

A (ξ) x \leq b (ξ)

is equivalent to the intersection of m scalar constraints

⋂_{s = 1}^{m} \{A_{s} (ξ) x \leq b_{s} (ξ)\},

where

A_{s} (ξ) \in R^{n}

is the s-th row of

A

, and

b_{s} (ξ) \in R

is the s-th component of

b

.

Let

α \in (0, 1)

represent the total risk level, i.e., the required confidence level is

1 - α

. If the distribution of each

ξ_{t}

in Model (23) is not known precisely but belongs to a confidence set

D_{t}

, constructed from historical data, based on (6), we know that

D_{t}

is defined by

D_{t} = \{P^{(t)} |\begin{matrix} p_{i_{t}}^{(t)} = P^{(t)} \{ξ_{t} = \frac{z_{i_{t}}^{(t)} + z_{i_{t} + 1}^{(t)}}{2}\}, & i_{t} = 1, 2, \dots, N_{t}, \\ \sum_{i_{t} = 1}^{N_{t}} p_{i_{t}}^{(t)} = 1, \\ \sum_{i_{t} = 1}^{N_{t}} | p_{i_{t}}^{(t)} - {\hat{p}}_{i_{t}}^{(t)} (z_{i_{t}}^{(t)}, z_{i_{t} + 1}^{(t)}) | \leq γ_{t}, \\ 0 \leq p_{i_{t}}^{(t)} \leq 1, & i = 1, 2, \dots, N_{t} . \end{matrix}\}, t = 1, \dots, T,

(24)

where

{\hat{p}}_{i_{t}}^{(t)} (z_{i_{t}}^{(t)}, z_{i_{t} + 1}^{(t)})

is the estimated probability of

ξ_{t}

falling in

[z_{i_{t}}^{(t)}, z_{i_{t} + 1}^{(t)})

, and

γ_{t} \geq 0

controls the confidence level of

D_{t}

. Clearly, the overall confidence set for

ξ

is the product of these marginal sets, i.e.,

D = D_{1} \times \dots \times D_{T}

.

For the multistage DRCCP (23), we can prove the following result.

Theorem 4.

Let

ξ = (ξ_{1}, \dots, ξ_{T})

be a random vector with independent components. Let the confidence set

D = D_{1} \times \dots \times D_{T}

, with

D_{t}

being defined by (24). If there exist

α_{s} \in [0, 1]

(s = 1, \dots, m)

,

\sum_{s = 1}^{m} α_{s} \leq α

, such that

inf_{P \in D} P (A_{s} (ξ) x \leq b_{s} (ξ)) \geq 1 - α_{s}, s \in {1, \dots, m},

then

inf_{P \in D} P (⋂_{s = 1}^{m} \{A_{s} (ξ) x \leq b_{s} (ξ)\}) \geq 1 - α .

Proof.

Let

E_{s} = {A_{s} (ξ) x \leq b_{s} (ξ)}

(s = 1, \dots, m)

denote events that can potentially be dependent on the random vector

ξ

. For any joint distribution

P \in D

, the Bonferroni inequality (union bound) gives

P (⋃_{s = 1}^{m} E_{s}^{c}) \leq \sum_{s = 1}^{m} P (E_{s}^{c}),

where

E_{s}^{c} = {A_{s} (ξ) x > b_{s} (ξ)}

is the complement of

E_{s}

. Therefore,

P (⋂_{s = 1}^{m} E_{s}) = 1 - P (⋃_{s = 1}^{m} E_{s}^{c}) \geq 1 - \sum_{s = 1}^{m} P (E_{s}^{c}) .

Taking the infimum over

P \in D

on both sides,

inf_{P \in D} P (⋂_{s = 1}^{m} E_{s}) \geq 1 - sup_{P \in D} \sum_{s = 1}^{m} P (E_{s}^{c}) .

For each s, the individual chance constraint condition

inf_{P \in D} P (E_{s}) \geq 1 - α_{s}

implies

sup_{P \in D} P (E_{s}^{c}) = sup_{P \in D} [1 - P (E_{s})] = 1 - inf_{P \in D} P (E_{s}) \leq α_{s}

. Thus,

sup_{P \in D} \sum_{s = 1}^{m} P (E_{s}^{c}) \leq \sum_{s = 1}^{m} sup_{P \in D} P (E_{s}^{c}) \leq \sum_{s = 1}^{m} α_{s} .

Therefore,

inf_{P \in D} P (⋂_{s = 1}^{m} E_{s}) \geq 1 - \sum_{s = 1}^{m} α_{s} \geq 1 - α .

We have completed the proof. □

Remark 7.

In the proof of Theorem 4, the Bonferroni inequality is only employed to derive a sufficient condition by which one can easily transform a joint chance constraint into a group of individual chance constraints, so as to obtain a computationally tractable reformulation of the original model for the given confidence level of the joint chance constraint. Although this inequality seems overly conservative, Theorem 4 enables a conservative approximation of the joint chance constraint by a group of individual chance constraints with easily allocated confidence levels, such as taking

α_{s} = α / m

for all s. Clearly, this is valuable for future research so that a more compact inequality can be used to further optimize the confidence levels of individual chance constraints.

Remark 8.

Note that Theorem 4 is proven under the assumption that the random variables are independent and the confidence set is a product of marginal sets, but, by this theorem, the results in Proposition 2 and Theorem 3 can be directly applied to handle the multistage DRCCP studied in the subsequent section.

4. Numerical Tests

4.1. Stochastic Multiperiod Capacitated Lot-Sizing Problems

To assess the performance of the proposed DRCCP approach, we carry out numerical tests on our method by solving the SMP-CLSP, one of the most fundamental inventory management problems [8,24]. Mathematically, the model of this problem reads as follows:

\begin{matrix} min_{x, y} & \sum_{t = 1}^{T} ({\bar{c}}_{t} y_{t} + (c_{t} + H_{t}) x_{t}) \end{matrix}

(25a)

s . t . inf_{P \in D} P (\begin{matrix} x_{1} & \geq ξ_{1} \\ x_{1} + x_{2} & \geq ξ_{1} + ξ_{2} \\ ⋮ & ⋮ \\ x_{1} + x_{2} + \dots + x_{T} & \geq ξ_{1} + ξ_{2} + \dots + ξ_{T} \end{matrix}) \geq 1 - α,

(25b)

\begin{matrix} x_{t} \leq M_{t} y_{t} t = 1, \dots, T, \end{matrix}

(25c)

\begin{matrix} x_{t} \in Z^{+}, t = 1, \dots, T, \end{matrix}

(25d)

\begin{matrix} y_{t} \in {0, 1} t = 1, \dots, T, \end{matrix}

(25e)

where the initial inventory level is set to 0, T represents the time horizon,

M_{t}

denotes an upper bound on the quantity of units manufactured during period t,

c_{t}

stands for the unit production cost in period t,

H_{t}

is the unit holding cost for period t,

{\bar{c}}_{t}

refers to the fixed setup cost per production run,

ξ_{t}

represents the demand during period t,

x_{t}

is the number of units to be produced in period t,

Z^{+}

represents a set of nonnegative integers, and

y_{t}

serves as a binary variable—it takes the value of 1 when a setup is performed during period t and 0 otherwise. More precisely,

y_{t} = 1

when

x_{t} > 0

, and

y_{t} = 0

when

x_{t} = 0

.

x_{t}

and

y_{t}

are decision variables, where

t = 1, \dots, T

.

The constraint (25a) ensures that the cumulative production up to period t meets the cumulative demand

ξ_{t}

with a higher probability

1 - α

under the worst-case distribution from

D

. Here,

D

is the product of marginal confidence sets

D_{t}

for each

ξ_{t}

, as defined in Equation (24), and the random variables

ξ_{t}

are assumed independent. Thus, (25a) is equivalent to

inf_{P \in D} P (⋂_{t = 1}^{T} \{\sum_{s = 1}^{t} x_{s} \geq \sum_{s = 1}^{t} ξ_{s}\}) \geq 1 - α .

By Theorem 4, if we allocate risk levels

α_{t}

such that

\sum_{t = 1}^{T} α_{t} \leq α

satisfying

inf_{P \in D} P (\sum_{s = 1}^{t} x_{s} \geq \sum_{s = 1}^{t} ξ_{s}) \geq 1 - α_{t}, t = 1, \dots, T,

(26)

then the inequality (25a) holds. For example, we can take equal risk allocation, i.e.,

α_{t} = α / T

for all t.

4.2. Reformulation of Models and Development of Algorithms

We next reformulate Problem (25) as an ordinary optimization problem.

Let the random variable

ξ_{s}

be approximated by a confidence set

D_{s}

defined in (24),

s = 1, \dots, t

. That is to say,

ξ_{s}

is discretized by

ξ_{s} = η_{i_{s}}^{(s)}

with probabilities

p_{i_{s}}^{(s)}

,

i_{s} = 1, \dots, N_{s}

.

Define a random variable

ζ_{t} = \sum_{s = 1}^{t} ξ_{s} .

Then,

ζ_{t}

takes the value

\sum_{s = 1}^{t} η_{i_{s}}^{(s)}

with probability

\prod_{s = 1}^{t} p_{i_{s}}^{(s)}

. Let

N_{ζ} = \prod_{s = 1}^{t} N_{s}

denote the number of all values taken by

ζ_{t}

. Sorting all sums

\sum_{s = 1}^{t} η_{i_{s}}^{(s)}

in ascending order yields

N_{ζ}

nodes

\sum_{s = 1}^{t} η_{i_{s}^{(l)}}^{(s)}

with probability

\prod_{s = 1}^{t} p_{i_{s}^{(l)}}^{(s)}

,

l = 1, 2, \dots, N_{ζ}

. Denote all nodes by

ζ_{t}^{(l)}

,

l = 1, 2, \dots, N_{ζ}

. Then, the cumulative distribution function (CDF) of

ζ_{t}

is specified by the following step function:

F_{t} (z) = \{\begin{matrix} 0, & if z < ζ_{t}^{(1)}; \\ \sum_{q = 1}^{l} \prod_{s = 1}^{t} p_{i_{s}^{(q)}}^{(s)}, & if ζ_{t}^{(l)} \leq z < ζ_{t}^{(l + 1)}, for, l = 1, 2, \dots, N_{ζ}; \\ 1, & if z \geq ζ_{t}^{(N_{ζ})} . \end{matrix}

Consequently, for any given

x_{s}

,

s = 1, 2, \dots, t

, the value of

inf_{P \in D} P (\sum_{s = 1}^{t} x_{s} \geq \sum_{s = 1}^{t} ξ_{s})

is computed by solving the following auxiliary problem:

\begin{matrix} min & F (\sum_{s = 1}^{t} x_{s}) = \sum_{q = 1}^{l} \prod_{s = 1}^{t} p_{i_{s}^{(q)}}^{(s)} \\ s . t . & p_{i_{s}^{(q)}}^{(s)} \in D_{s}, s = 1, 2, \dots, t, \\ \sum_{s = 1}^{t} η_{i_{s}^{(l)}}^{(s)} \geq \sum_{s = 1}^{t} x_{s} . \end{matrix}

(27)

Indeed, if

p_{i_{s}^{(*)}}^{(s)}

is the optimal solution of Problem (27), then its corresponding value is the minimal value of

P (\sum_{s = 1}^{t} x_{s} \geq \sum_{s = 1}^{t} ξ_{s})

in

D

.

With a given distribution

F_{r b}

, such as the optimal solution

p_{i_{s}^{(*)}}^{(s)}

of Problem (27), we construct a master problem to solve the SMP-CLSP (25) as follows.

\begin{matrix} min_{x, y} & \sum_{t = 1}^{T} ({\bar{c}}_{t} y_{t} + (c_{t} + H_{t}) x_{t}) \\ s . t . & \sum_{s = 1}^{t} x_{s} \geq F_{t}^{* - 1} (1 - α_{t}), \forall t = 1, 2, \dots, T, \\ x_{t} \leq M_{t} y_{t} t = 1, \dots, T, \\ x_{t} \in Z^{+}, t = 1, \dots, T, \\ y_{t} \in {0, 1} t = 1, \dots, T . \end{matrix}

(28)

Model (28) effectively reformulates the original problem (25) as a deterministic programming model.

With the above preparations, we now develop an algorithm to solve the original DRCCP (25), where the master problem is a mixed-integer linear programming problem and is first solved for an initial approximate solution of the auxiliary problem; then, the auxiliary problem is solved for the obtained approximate solution of the master problem. The computational procedure is specified in Algorithm 1.

However, by Theorem 3, we can further reformulate Problem (25) as another form of standard constrained optimization problem.

Indeed, for each period

t = 1, \dots, T

, using the reference distribution

{\hat{P}}_{t}

at the t-th stage, Problem (26) corresponds to Case (1) in Proposition 2 under the condition

A_{k} x - b_{k} \geq 0

, where

A_{k} = 0

and

b_{k} = - 1

. The structure of the feasible set

F_{c c}

, as characterized by Theorem 3, depends on the value of

U (x) = \sum_{s = 1}^{t} x_{s}

:

(1) If

U (x) < z_{1}

, part 1(i) implies that

F_{c c} = Ø

.

(2) If

U (x) \geq z_{N}

, set

C_{1}^{1}

in part 2 applies, yielding

F_{c c} = \{x \in R^{n} : U (x) \geq z_{N}\}

.

(3) If

z_{j} \leq U (x) < z_{j + 1}

for some

j \in {1, \dots, N}

, set

C_{1}

in part 3 applies, and the feasible set is given by

\{x \in R^{n} : z_{j} \leq U (x) < z_{j + 1}, \sum_{i = 1}^{j} {\hat{p}}_{i} (z_{i}, z_{i + 1}) \geq 1 - α_{t} + \frac{γ_{t}}{2 T}\} .

The reformulation of the chance constraint is explicitly given by the condition for set

C_{1}

in the third case above. Therefore, for any x satisfying

z_{j} \leq U (x) < z_{j + 1}

, the data-driven chance constraint (26) is equivalent to

{\hat{P}}_{t} (\sum_{s = 1}^{t} x_{s} \geq \sum_{s = 1}^{t} ξ_{s}) \geq 1 - α_{t} + \frac{γ_{t}}{2 T} .

(29)

Let

{\hat{F}}_{t}

be the estimated cumulative distribution function (CDF) of

ζ_{t} = \sum_{s = 1}^{t} ξ_{s}

. Then, the constraint (29) becomes

{\hat{F}}_{t} (\sum_{s = 1}^{t} x_{s}) \geq 1 - α_{t} + \frac{γ_{t}}{2 T} .

Thus, the deterministic reformulation of (25) is

\begin{matrix} min_{x, y} & \sum_{t = 1}^{T} ({\bar{c}}_{t} y_{t} + (c_{t} + H_{t}) x_{t}) \\ s . t . & \sum_{s = 1}^{t} x_{s} \geq {\hat{F}}_{t}^{- 1} (1 - α_{t} + \frac{γ_{t}}{2 T}), t = 1, \dots, T, \\ x_{t} \leq M_{t} y_{t}, t = 1, \dots, T, \\ x_{t} \in Z^{+}, t = 1, \dots, T, \\ y_{t} \in {0, 1}, t = 1, \dots, T . \end{matrix}

(30)

Algorithm 1 Alternating DRCCP-LS algorithm

Input: Time horizon T; sample sizes:

n_{s}

for

s = 1, \dots, T

; sample data:

{ξ_{s}^{j}}_{j = 1}^{n_{s}}

for

s = 1, \dots, T

; interpolation tolerance:

T H

; divergence tolerance:

γ_{t}

,

t = 1, \dots, T

; risk level

α

with

α_{t} = α / T

,

t = 1, \dots, T

; cost parameters:

{\bar{c}}_{t}

,

c_{t}

,

H_{t}

,

t = 1, \dots, T

; capacity parameters:

M_{t}

,

t = 1, \dots, T

; tolerance

ϵ

Output: Optimal production plan

(x^{*}, y^{*})

and the minimum total cost

T C^{*}

1: Apply Rules (I)-(III) to determine the bin parameters

N_{s}

,

η_{i_{s}}^{(s)}

, and reference distributions

{\hat{p}}_{i_{s}}^{(s)}

,

i_{s} = 1, \dots, N_{s}

,

s = 1, \dots, T

2: Initialize a production plan, denoted by

x^{(0)} \leftarrow 1

; the total cost

T C^{(0)} \leftarrow \infty

; the iteration counter

k \leftarrow 0

; the convergence flag

converged \leftarrow false

3: while not converged do

4: Initialize robust quantiles

Q^{(k)} \leftarrow [0, \dots, 0]

of length T

5: for

t = 1

to T do

6: Compute cumulative production

z_{t} \leftarrow \sum_{s = 1}^{t} x_{s}^{(k - 1)}

7: Solve the auxiliary problem (27) with

x_{s} = x^{(k - 1)}

. Denote its optimal solution by

P_{t}^{(s) *}

for

s = 1, \dots, t

8: With

P_{t}^{(s) *}

,

s = 1, \dots, t

, define a distribution function

F_{t}^{*}

, and compute

Q^{(k)} [t] \leftarrow F_{t}^{* - 1} (1 - α_{t})

9: end for

10: With

Q^{(k)}

, solve the master problem (28). The optimal solution is referred to as

(x^{(k)}, y^{(k)})

, and set

T C^{(k)} \leftarrow \sum_{t = 1}^{T} ({\bar{c}}_{t} y_{t}^{(k)} + (c_{t} + H_{t}) x_{t}^{(k)})

11: if

| T C^{(k)} - T C^{(k - 1)} | < ϵ

then

12:

converged \leftarrow true

13: else

14:

k \leftarrow k + 1

15: end if

16: end while

17:

x^{*} \leftarrow x^{(k)}

,

y^{*} \leftarrow y^{(k)}

,

T C^{*} \leftarrow T C^{(k)}

18: return

(x^{*}, y^{*})

and

T C^{*}

Problem (30) is also a mixed-integer linear programming (MILP) model, which can be solved using standard optimization solvers. By virtue of this reformulation defined by Model (30), we can present a more efficient algorithm to find a robust optimal production plan. The computational procedure is specified in Algorithm 2.

Algorithm 2 Reformulated DRCCP-LS algorithm

Input: Time horizon T; sample sizes:

n_{s}

for

s = 1, \dots, T

; sample data:

{ξ_{s}^{j}}_{j = 1}^{n_{s}}

for

s = 1, \dots, T

; interpolation tolerance:

T H

; divergence tolerance:

γ_{t}

,

t = 1, \dots, T

; risk parameters:

α

,

α_{t} = α / T

,

t = 1, \dots, T

; cost parameters:

{\bar{c}}_{t}

,

c_{t}

,

H_{t}

,

t = 1, \dots, T

; capacity parameters:

M_{t}

,

t = 1, \dots, T

Output: Optimal production plan

(x^{*}, y^{*})

and the minimum total cost

T C^{*}

1: Apply Rules (I)-(III) to determine the bin parameters

N_{s}

,

η_{i_{s}}^{(s)}

, and reference distributions

{\hat{p}}_{i_{s}}^{(s)}

,

s = 1, \dots, T

2: Compute the estimated cumulative distribution functions

{\hat{F}}_{t}

for

ζ_{t} = \sum_{s = 1}^{t} ξ_{s}

,

t = 1, \dots, T

, using the reference distributions

{\hat{p}}_{i_{s}}^{(s)}

3: for

t = 1

to T do

4: Compute the quantile

{\hat{Q}}_{t} = {\hat{F}}_{t}^{- 1} (1 - α_{t} + \frac{γ_{t}}{2 T})

5: end for

6: Solve the MILP problem (30) to obtain

(x^{*}, y^{*})

and

T C^{*}

7: return

(x^{*}, y^{*})

and

T C^{*}

When only the traditional histogram, instead of the approach proposed in this paper, is employed to define the confidence set, Algorithm 2 is replaced by Algorithm 1.

Remark 9.

Clearly, compared with Algorithm 2, Algorithm 1 involves alternative solutions of a mixed-integer linear master problem and an auxiliary nonlinear programming problem. In contrast, owing to the used reformulation technique in Model (30), Algorithm 2 is developed to directly solve a single mixed-integer linear program, being beneficial in the use of any off-the-shelf MILP solver. In the subsequent section, we will further validate its efficiency advantages through numerical tests.

4.3. Numerical Tests

In our computational experiments, we consider a planning horizon of

T = 2

periods, which is sufficient to capture the essential dynamics of the multiperiod DRCCP. The initial inventory is initialized to zero. Both the holding cost and setup cost are assumed to be constant throughout the experiment, with values of 0.50 and 48, respectively. The production capacity is fixed at 50 units per period. The variable production costs are dependent on the period, and they are 28 and 15 when

t = 1

and

t = 2

, respectively.

To simulate realistic and challenging demand patterns, we construct nonstandard demand distributions. The demand in each period is a nonnegative random variable, and demands across periods are mutually independent. For

t = 1

, the demand follows a mixture of two negative binomial distributions,

0.3 NB (18, 0.8) + 0.7 NB (56, 0.8116)

, where the parameters r and p in

NB (r, p)

represent the number of successes and the success probability in each trial, respectively. This mixture distribution can capture the multimodal characteristics that are representative of an uncertain market. For

t = 2

, the demand is generated from a Poisson distribution with mean

λ = 15

, incorporating outliers that occur with a probability of

6 %

.

In Figure 2 and Figure 3, we show the application of the proposed IDE-PLI method and the traditional histogram-based approach to estimating the probability density functions for the demand distributions at

t = 1

and

t = 2

under different sample sizes.

From the results in Figure 2 and Figure 3, the following can be observed.

(1) When the sample size is 5000 and the data contain more outliers, the fixed binning strategy of the traditional histogram method results in the substantial deviation of the histogram shape from the true distribution (see the last sub-figure in Figure 2).

(2) Although the traditional histogram method can better capture the distribution characteristics when the sample size increases to 20,000 (see the last sub-figure in Figure 3), the number of bins in this method grows significantly, thereby increasing the model complexity. In contrast, the IDE-PLI method can effectively capture the underlying multimodal structure and tail behavior of the empirical data at both sample sizes without overfitting to sampling noise.

(3) The proposed IDE-PLI method always requires fewer bins compared to the traditional histogram method. This significant reduction can directly improve the computational efficiency in solving subsequent robust DEFs.

In order to assess the applicability of the proposed IDE-PLI method, we further apply this method to solve the SMP-CLSP, especially in comparison with other methods, when all of them are employed to construct distinct confidence sets. Besides Algorithm 1 (denoted by DRCCP-A) and Algorithm 2 (denoted by DRCCP-E), the other three compared methods are stated as follows.

CCC: The classical chance-constrained programming approach without dependence on constructing any confidence set, which assumes that the true demand distribution is known.

DRCCP-AT: An algorithm with the reference distribution in Algorithm 1 being replaced by the traditional histogram.

DRCCP-KL: The method in [25], where the classic empirical distribution is used as the reference distribution in the construction of confidence sets, and the confidence set is defined by the Kullback–Leibler divergence.

In Table 1, we present a comparison of the computational running times of various methods with the sample sizes of 5000 and 50,000 under different risk aversion levels.

From the numerical results in Table 1, the following conclusions can be drawn.

(1) The proposed DRCCP-E method exhibits remarkable computational efficiency and is even superior to the CCC method. The CCC model runs the fastest without considering the robustness of optimal solutions, but its assumption of a known true distribution is often impractical.

(2) As the sample size increases from 5000 to 50,000, DRCCP-A exhibits a significantly shorter running time compared with DRCCP-AT under the same setting of

α

. This result highlights the computational advantage of the adaptive binning strategy employed in DRCCP-A over the traditional histogram approach used in DRCCP-AT.

(3) The DRCCP-KL method consistently records the highest computational costs owing to the solution of complicated reformulated models therein. Indeed, its running time increases as the sample size grows, which underscores the numerical challenges associated with the Kullback–Leibler divergence constraint.

Considering changing risk levels (

α

) and changing divergence tolerances (

γ

), we conduct a sensitivity analysis of the minimized total cost and the optimal solutions derived from different methods, so as to validate the ability of the proposed data-driven robust approach in terms of achieving a trade-off between cost and robustness, as well as between optimality and robustness.

For this purpose, we change the value of the divergence tolerance from 0.03 to 0.09 with a step size of 0.02, and we change the value of

α

from 0.05 to 0.15 with a step size of 0.05. The minimized total cost and the corresponding optimal solutions across different combinations of

α

and

γ

are listed in Table 2 and Table 3, respectively. In Table 3, each optimal solution is represented as

(x_{1}, x_{2}; y_{1}, y_{2})

. All experimental results are derived from a training sample size of 50,000, ensuring statistical significance.

From the results in Table 2, the following observations can be made.

(1) For the distributionally robust models (DRCCP-E, DRCCP-A, DRCCP-AT), the total cost increases as

γ

increases, reflecting the price of robustness against a larger confidence set. In contrast, the DRCCP-KL method displays nonmonotonic cost behavior, which can be attributed to the nonlinear nature of KL constraints, which often introduce numerical instabilities during optimization and lead to inconsistent convergence. The total cost of the CCC model remains constant since it is not influenced by

γ

. Among them, the performance of the DRCCP-A and DRCCP-AT methods is comparable, although the former outperforms the latter when

γ = 0.05

.

(2) For each of the fixed

α

values, the proposed DRCCP-E model consistently yields the lowest costs for the majority of the given values of

γ

among all compared robust methods. This advantage in terms of the total cost underscores its effectiveness in striking a balance between economic efficiency and robustness.

Furthermore, from the results in Table 3, the following observations are derived.

(1) For all DRCCP methods except DRCCP-KL, the optimal quantity of single-period production is generally increasing with increased values of

γ

, and the total quantity of two-period production consistently also displays an increasing trend in accordance with theoretical expectations. Indeed, a larger value of

γ

can expand the size of the confidence set, enforcing robustness by a wider range of distributions but easily leading to an overconservative solution.

(2) The classical CCC method, which assumes a known distribution, always yields the smallest quantity of production among the compared methods since it does not account for distributional ambiguity. However, this advantage may disappear if the nominal distribution is misspecified.

Since any chance-constrained programming approach aims to minimize the constraint violation degree of the obtained optimal strategy when applied to unrealized scenarios of random model parameters in the future, we investigate the performance of the compared five methods in this context, i.e., the percentage of test samples that causes the demand to exceed the production capacity and the risk of constraint violation. In Table 4, we list the violation probabilities of the constraints under different risk levels and divergence tolerances, where 5000 samples are generated for the learning of the distribution and 500 samples are generated for an out-of-sample test.

From the listed results in Table 4, we derive the following observations.

(1) Owing to the assumption of knowing the true demand distribution, the violation probability of the CCC method remains stable across different

γ

and is generally below the corresponding targeted level of

α

, which is attributed to finite-sample estimation errors in this method.

(2) All DRCCP methods exhibit consistently and significantly lower violation probabilities than those in the CCC method, further validating their characteristic superiority in view of robustness. By considering the ambiguity of the demand distribution, all DRCCP methods can provide a more conservative production strategy, so as to hedge against the worst-case scenario.

(3) As

γ

increases, the violation probabilities of all DRCCP methods (DRCCP-E, DRCCP-A, DDRCCP-AT, and most cases of DRCCP-KL) decrease monotonically. This inverse relationship between

γ

and the violation probability demonstrates that one can effectively tune

γ

to achieve a desired risk protection level, albeit at the potential cost of higher production expenses.

(4) The primary practical advantage of the proposed DRCCP-E method lies in its balanced performance. Although it does not always achieve the lowest violation probability, it consistently maintains violation probabilities below the required risk level (

α

). From the previously conducted cost analysis, it is concluded that DRCCP-E generally achieves the minimum total cost among all compared methods. Thus, it is capable of successfully achieving a trade-off between reliability and cost, i.e., it can provide an optimal robust solution with an acceptable violation degree regarding constraints while avoiding the excessive costs associated with overconservative strategies.

(5) The violation probabilities of DRCCP-A and DRCCP-AT typically lie between those of CCC and DRCCP-E. This similar performance indicates that both of the histogram-based confidence sets (adaptive vs. traditional) offer comparable robustness. The observed minor variations between them arise from their differences in choosing the bins and allocating the probability mass.

In summary, it is concluded that all the DRCCP methods can reduce the violation probabilities compared to the CCC approach. The proposed DRCCP-E method provides the best balance between reliable risk control and a minimal production cost. The CCC method offers no protection against distributional misspecification. The divergence tolerance

γ

can serve as a critical model parameter employed to control the robustness levels in violating the constraints.

To conclude this section, we evaluate the performance of the compared distributionally robust optimization models when the confidence level and the sample size vary. In Figure 4, we present the numerical results obtained by taking values of

1 - α

from 0.85 to 0.95 under different sample sizes of 1000, 5000, 10,000, 20,000, and 50,000.

Looking at Figure 4, the following conclusions can be drawn.

(1) The total cost obtained by each method consistently decreases as the confidence level decreases, except for DRCCP-KL. This fundamentally reflects the trade-off between risk and cost, as well as the inherent conservatism in robust optimization. A higher confidence level implies a low probability of stockout. Consequently, the model adopts a conservative strategy, typically involving higher production to build safety stock against demand uncertainty, thereby increasing the total cost. DRCCP-KL occasionally deviates from this trend, which can be attributed to the numerical instabilities introduced by the nonlinear structure of the KL divergence constraints during optimization, as noted previously.

(2) The proposed DRCCP-E method displays remarkable stability across varying sample sizes. In contrast, DRCCP-A and DRCCP-AT show greater sensitivity to the sample size. Furthermore, as the confidence level decreases and the sample size increases, the minimal costs obtained by the DRCCP-A and DRCCP-AT methods converge. This convergence occurs because both methods are based on the same Algorithm 1, and, with a sufficiently large sample size, the distributions estimated by both the traditional histogram approach and the proposed IDE-PLI method converge to the true distribution. This result further validates the effectiveness of the proposed density estimation method.

(3) Across all settings of changing parameters, the proposed DRCCP-E method displays competitive performance, with the total cost lying between those of the conservative methods (DRCCP-AT, DRCCP-A, DRCCP-KL) and the nonrobust CCC benchmark.

In conclusion, the DRCCP-E method has the advantage of achieving a more appropriate balance between cost-effectiveness and robustness.

5. Conclusions

In this study, we propose a novel approach to constructing a data-driven adaptive confidence set for optimization problems with chance constraints, which only involves finitely many unknown parameters. By leveraging this confidence set, the original complicated chance constraint is reformulated into some tractable ordinary constraints. It is remarkable that, compared with existing works, the proposed reformulation does not involve solving complicated additional auxiliary optimization problems, a semi-definite programming problem, or a mixed-integer second-order cone programming problem.

We further expand the scope of this work to address stochastic multistage DRCCPs. A complete alternating solution method and a reformulation-based solution strategy are presented for the SMP-CLSP under demand uncertainty, along with detailed algorithmic frameworks established for each approach. In the alternating solution strategy, the auxiliary problem computes robust quantiles using nonlinear programming, while the master problem solves a deterministic mixed-integer programming problem. The resulting alternating DRCCP-LS algorithm efficiently handles distributional uncertainty while maintaining computational tractability. In the reformulation-based strategy, the DRCCP is transformed into a standard constrained optimization form, leading to the reformulated DRCCP-LS algorithm. Numerical experiments demonstrate that the proposed DRCCP-E method achieves a more appropriate balance between cost-effectiveness and robustness.

In future research, the proposed approach can be extended to handle chance constraints associated with multiple random factors, as well as developing efficient algorithms to solve the derived DEFs based on the property analysis of models. A straightforward yet computationally costly generalization method would be to construct a grid over the multidimensional sample space. Thus, investigating computational strategies such as adaptive partitioning or dimension reduction techniques will be essential. Additional directions could involve the extension of the proposed methodology to more complicated scenarios with time-varying confidence sets. Another promising direction is the application of the proposed methodology to practical operational decision-making problems in engineering and management under uncertain environments. These could include robust supply chain management, portfolio optimization, production planning, and the robust utilization of renewable energy sources such as wind and solar power.

Author Contributions

Conceptualization, Z.W.; validation, H.D.; formal analysis, H.D. and Z.W.; investigation, H.D.; writing—original draft, H.D.; writing—review and editing, Z.W.; supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Social Science Foundation of China (Grant No. 21BGL122).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

We declare that all authors have no conflicts of interest regarding the submission and publication of this paper.

References

Nemirovski, A.; Shapiro, A. Convex approximations of chance constrained programs. SIAM J. Optim. 2007, 17, 969–996. [Google Scholar] [CrossRef]
Calafiore, G.C.; Ghaoui, L.E. On distributionally robust chance-constrained linear programs. J. Optim. Theory Appl. 2006, 130, 1–22. [Google Scholar] [CrossRef]
Delage, E.; Ye, Y. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 2010, 58, 595–612. [Google Scholar] [CrossRef]
Jiang, N.; Xie, W. ALSO-X#: Better convex approximations for distributionally robust chance constrained programs. Math. Program. 2025, 213, 575–638. [Google Scholar]
Meng, Q.; Jin, X.; Luo, F.; Wang, Z.; Hussain, S. Distributionally robust scheduling for benefit allocation in regional integrated energy system with multiple stakeholders. J. Mod. Power Syst. Clean Energy 2024, 12, 1631–1642. [Google Scholar] [CrossRef]
Mohajerin Esfahani, P.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
Ning, C.; You, F. Optimization under uncertainty in the era of big data and deep learning: When machine learning meets mathematical programming. Comput. Chem. Eng. 2019, 125, 434–448. [Google Scholar] [CrossRef]
Jiang, R.; Guan, Y. Data-driven chance constrained stochastic program. Math. Program. 2016, 158, 291–327. [Google Scholar] [CrossRef]
Zymler, S.; Kuhn, D.; Rustem, B. Distributionally robust joint chance constraints with second-order moment information. Math. Program. 2013, 137, 167–198. [Google Scholar] [CrossRef]
Shapiro, A.; Zhou, E.; Lin, Y. Bayesian distributionally robust optimization. SIAM J. Optim. 2023, 33, 1279–1304. [Google Scholar] [CrossRef]
Jiang, Y.; Ren, Z.; Li, W. Committed carbon emission operation region for integrated energy systems: Concepts and analyses. IEEE Trans. Sustain. Energy 2023, 15, 1194–1209. [Google Scholar] [CrossRef]
Kuhn, D.; Shafiee, S.; Wiesemann, W. Distributionally robust optimization. Acta Numer. 2025, 34, 579–804. [Google Scholar] [CrossRef]
Nguyen, V.A.; Kuhn, D.; Mohajerin Esfahani, P. Distributionally robust inverse covariance estimation: The Wasserstein shrinkage estimator. Oper. Res. 2022, 70, 490–515. [Google Scholar] [CrossRef]
Rahimian, H.; Mehrotra, S. Frameworks and results in distributionally robust optimization. Open J. Math. Optim. 2022, 3, 4. [Google Scholar] [CrossRef]
Chen, Z.; Kuhn, D.; Wiesemann, W. Data-driven chance constrained programs over Wasserstein balls. Oper. Res. 2024, 72, 410–424. [Google Scholar] [CrossRef]
Zhong, J.; Zhao, Y.; Li, Y.; Yan, M.; Peng, Y.; Cai, Y.; Cao, Y. Synergistic operation framework for the energy hub merging stochastic distributionally robust chance-constrained optimization and Stackelberg game. IEEE Trans. Smart Grid 2024, 16, 1037–1050. [Google Scholar] [CrossRef]
Küçükyavuz, S.; Jiang, R. Chance-constrained optimization under limited distributional information: A review of reformulations based on sampling and distributional robustness. EURO J. Comput. Optim. 2022, 10, 100030. [Google Scholar] [CrossRef]
Zhang, B.; Meng, L.L.; Lu, C.; Han, Y.Y.; Sang, H.Y. Automatic design of constructive heuristics for a reconfigurable distributed flowshop group scheduling problem. Comput. Oper. Res. 2024, 161, 106432. [Google Scholar] [CrossRef]
Pearson, K. Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. A 1894, 185, 71–110. [Google Scholar] [CrossRef]
Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Van Kerm, P. Adaptive kernel density estimation. Stata J. 2003, 3, 148–156. [Google Scholar] [CrossRef]
Hoeffding, W. Probability Inequalities for Sums of Bounded Random Variables. J. Am. Stat. Assoc. 1963, 58, 13–30. [Google Scholar] [CrossRef]
Levin, D.A.; Peres, Y. Markov Chains and Mixing Times; American Mathematical Society: Providence, RI, USA, 2017. [Google Scholar]
Beraldi, P.; Ruszczyński, A. A branch and bound method for stochastic integer problems under probabilistic constraints. Optim. Methods Softw. 2002, 17, 359–382. [Google Scholar] [CrossRef]
Chen, Y.; Guo, Q.; Sun, H.; Li, Z.; Wu, W.; Li, Z. A distributionally robust optimization model for unit commitment based on Kullback–Leibler divergence. IEEE Trans. Power Syst. 2018, 33, 5147–5160. [Google Scholar] [CrossRef]

Figure 1. The structure of the proposed approach to adaptively estimating the probability density based on Rules (I)–(III).

Figure 2. Distribution estimations with 5000 samples by the IDE-PLI method and the traditional histogram-based method.

Figure 3. Distribution estimations with 20,000 samples by the IDE-PLI method and the traditional histogram-based method.

Figure 4. Cost values corresponding to different methods at various confidence levels when

γ = 0.05

.

Figure 4. Cost values corresponding to different methods at various confidence levels when

γ = 0.05

.

Table 1. Running time comparison.

Sample Size	$α$	Time (s)
Sample Size	$α$	CCC	DRCCP-E	DRCCP-A	DRCCP-AT	DRCCP-KL
5000	$0.05$	0.42688	0.29209	13.371	17.849	51.000
	$0.10$	0.41052	0.28372	13.093	17.635	50.675
	$0.15$	0.38937	0.27614	13.261	17.658	51.035
50,000	$0.05$	0.40226	0.25176	13.819	32.574	60.924
	$0.10$	0.40113	0.23571	3.7973	32.282	59.093
	$0.15$	0.38458	0.25093	3.8887	32.151	59.934

Table 2. Comparison of the total costs for different methods.

$α$	Method	Minimized Total Cost
$α$	Method	$γ = 0.03$	$γ = 0.05$	$γ = 0.07$	$γ = 0.09$
$α = 0.05$	CCC	1048	1048	1048	1048
	DRCCP-E	1336.5	1367.5	1398.5	1514.5
	DRCCP-A	1367.5	1472	1615	1619
	DRCCP-AT	1383	1526	1545.5	1657.5
	DRCCP-KL	1472	1418	1485	1471.5
$α = 0.10$	CCC	972.5	972.5	972.5	972.5
	DRCCP-E	1201	1216.5	1232	1263
	DRCCP-A	1232	1278.5	1367.5	1441
	DRCCP-AT	1218.5	1278.5	1338.5	1412
	DRCCP-KL	1338.5	1383	1317	1303.5
$α = 0.15$	CCC	912.5	912.5	912.5	912.5
	DRCCP-E	988	1019	1050	1112
	DRCCP-A	1096.5	1158.5	1216.5	1263
	DRCCP-AT	1112	1158.5	1218.5	1249.5
	DRCCP-KL	1247.5	1425	1350	1230

Table 3. Comparison of optimal solutions under different methods.

$α$	Method	Solution
$α$	Method	$γ = 0.03$	$γ = 0.05$	$γ = 0.07$	$γ = 0.09$
$α = 0.05$	CCC	(20,24;1,1)	(20,24;1,1)	(20,24;1,1)	(20,24;1,1)
	DRCCP-E	(23,37;1,1)	(23,39;1,1)	(23,41;1,1)	(27,41;1,1)
	DRCCP-A	(23,39;1,1)	(25,42;1,1)	(31,40;1,1)	(29,44;1,1)
	DRCCP-AT	(23,40;1,1)	(29,38;1,1)	(27,43;1,1)	(33,39;1,1)
	DRCCP-KL	(25,42;1,1)	(21,46;1,1)	(34,26;1,1)	(33,27;1,1)
$α = 0.10$	CCC	(19,21;1,1)	(19,21;1,1)	(19,21;1,1)	(19,21;1,1)
	DRCCP-E	(21,32;1,1)	(21,33;1,1)	(21,34;1,1)	(21,36;1,1)
	DRCCP-A	(21,34;1,1)	(21,37;1,1)	(23,39;1,1)	(25,40;1,1)
	DRCCP-AT	(20,35;1,1)	(21,37;1,1)	(22,39;1,1)	(24,40;1,1)
	DRCCP-KL	(22,39;1,1)	(23,40;1,1)	(25,32;1,1)	(24,33;1,1)
$α = 0.15$	CCC	(18,19;1,1)	(18,19;1,1)	(18,19;1,1)	(18,19;1,1)
	DRCCP-E	(19,22;1,1)	(19,24;1,1)	(19,26;1,1)	(19,30;1,1)
	DRCCP-A	(19,29;1,1)	(19,33;1,1)	(21,33;1,1)	(21,36;1,1)
	DRCCP-AT	(19,30;1,1)	(19,33;1,1)	(20,35;1,1)	(20,37;1,1)
	DRCCP-KL	(21,35;1,1)	(33,24;1,1)	(24,36;1,1)	(22,32;1,1)

Table 4. Violation probabilities of different methods under varying risk levels and divergence tolerances.

$α$	Method	Violation Probability
$α$	Method	$γ = 0.03$	$γ = 0.05$	$γ = 0.07$	$γ = 0.09$
$α = 0.05$	CCC	0.047	0.047	0.047	0.047
	DRCCP-E	0.043	0.040	0.020	0.010
	DRCCP-A	0.040	0.030	0.020	0.017
	DRCCP-AT	0.040	0.030	0.017	0.017
	DRCCP-KL	0.043	0.043	0.043	0.043
$α = 0.10$	CCC	0.067	0.067	0.067	0.067
	DRCCP-E	0.047	0.047	0.047	0.043
	DRCCP-A	0.047	0.043	0.043	0.040
	DRCCP-AT	0.047	0.047	0.043	0.040
	DRCCP-KL	0.047	0.043	0.043	0.043
$α = 0.15$	CCC	0.103	0.103	0.103	0.103
	DRCCP-E	0.050	0.050	0.047	0.047
	DRCCP-A	0.047	0.047	0.047	0.047
	DRCCP-AT	0.047	0.047	0.047	0.047
	DRCCP-KL	0.047	0.047	0.043	0.047

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deng, H.; Wan, Z. Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems. Mathematics 2026, 14, 331. https://doi.org/10.3390/math14020331

AMA Style

Deng H, Wan Z. Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems. Mathematics. 2026; 14(2):331. https://doi.org/10.3390/math14020331

Chicago/Turabian Style

Deng, Hua, and Zhong Wan. 2026. "Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems" Mathematics 14, no. 2: 331. https://doi.org/10.3390/math14020331

APA Style

Deng, H., & Wan, Z. (2026). Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems. Mathematics, 14(2), 331. https://doi.org/10.3390/math14020331

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Computable Reformulation of Data-Driven Distributionally Robust Chance Constraints: Validated by Solution of Capacitated Lot-Sizing Problems

Abstract

1. Introduction

2. Data-Driven Adaptive Confidence Sets with Finite Parameters

2.1. Adaptive Estimation of Probability Density Functions

2.2. Construction of Confidence Sets Only with Finitely Many Parameters

3. Reformulation of Data-Driven Distributionally Robust Chance Constraints

3.1. Reformulation of Distributionally Robust Chance Constraints with a Random Variable

3.2. Extension to Multistage Chance-Constrained Programming

4. Numerical Tests

4.1. Stochastic Multiperiod Capacitated Lot-Sizing Problems

4.2. Reformulation of Models and Development of Algorithms

4.3. Numerical Tests

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI