4.1. Optimizing over Probability Distributions
The computation of the objective function (4) and the probabilistic constraint (5) requires solving two optimization problems over distributions separately, that is, searching for the worst demand distribution affecting the objective value and the worst demand distribution affecting the demand satisfaction constraint.
Without loss of generality, we can consider the reference problem
where
refers to a generic functional to be minimized over the space of probability measures
. Solving problem (9) means searching for a sequence of probability distributions starting from an initial random guess
(typically a multivariate Gaussian) and converging to
. A well-known example for problem (9) is given in [
7], showing that solving the Fokker–Planck PDE is equivalent to minimize an entropy functional over the space of probability measures, equipped with the so-called Wasserstein distance.
The Wasserstein distance is defined as the minimum cost for transporting probability density mass from a source probability density to match a target probability density, and where the cost is computed in terms of a distance (also known as the
ground metric) between points of the support. In this paper, we consider the case that the cost is the Euclidean distance, leading to the so-called 2-Wasserstein distance:
where
means that applying
to
(i.e., our source distribution) will obtain
(i.e., our target distribution) as result.
denotes the so-called push-forward operator; while
is interpreted as a function moving a single data point (i.e., a customers demand scenario, in our case) over the support
,
represents its extension to an entire probability measure.
The Brenier Theorem [
3,
4] guarantees that the problem (10) has a unique optimal solution
, named the optimal transport map. Furthermore, the Brenier theorem also states that the optimal transport is equal to the gradient of a convex function
, such that
.
By introducing a coefficient
, we can define a
continuous-time representation of the optimal transport map
, that is,
Easily, the following discrete-time representation can be derived:
with
and where
is a discretization of
. As a result, the sequence
solves (10), with
as the solution of (10).
Since
is not known a priori,
cannot be directly computed. A widely adopted method to solve (10) is the JKO (Jordan–Kinderlehrer–Otto) scheme [
7] that is a proximal point method with respect to the 2-Wasserstein distance or, from another point of view, a backward Euler discretization. At a generic iteration
, JKO transports the current probability density,
, into the next
according to
with
as the JKO’s step size.
Solving problem (14) translates into searching for the next probability density that minimizes the functional within a 2-Wasserstein neighbourhood of the current , that is, a Wasserstein ball (aka ambiguity set) around . Iteratively solving problem (13) leads to a sequence whose rate of convergence to depends on : with , the first term goes to zero, meaning that we are minimizing but at the expense of obtaining a transport far away from the optimal one. On the contrary, if , then the first term becomes more relevant so that the generated sequence will be close to but it will converge significantly slowly to .
Finally, problem (9) is formulated in terms of
, that is, having probability densities as decision variables. State-of-the-art approaches recast problem (9) as an optimization problem in terms of a parametrized transport map from
to
. For instance, the convex function
is usually approximated through an Input Convex Neural Network (ICNN) [
31,
32,
33], a specific type of Deep Neural Network (DNN), and then
is obtained by following the Brenier Theorem, namely,
. DNNs are also used in [
34,
35] to approximate
through a DNN, whose output is then used as an input—along with
—of another DNN estimating
. Contrary to the ICNN-based approaches, this method allows for skipping the computation of
. More recently, [
36] proposed to parametrize
through a Residual Neural Network (ResNet) and a variational formulation of the functional
formulated as maximization over a parametric class of functions. This allows for reducing the computational burden with respect to the previous methods using small data samples and scaling well with the dimensionality of the support
. The small data regime setting has been also recently investigated in [
37], who proposed to combine OT solvers and Gaussian process regression to efficiently learn the transportation map. The most relevant issue for all the neural approaches is that at least one neural network must be trained, leading to relevant computational costs at each JKO step. Since getting close to
requires
and consequently a number of iterations
, this issue becomes even more critical. Moreover, there are not specific guidelines on how to choose the most suitable neural network to use. Moreover, [
38] has recently reported that normalizing flow models based on neural networks can only generate planar flows, which are proven to be expressive only in the case of univariate probability densities (i.e., 1-dimensional support). Another relevant study on the convergence and the self-consistency of normalizing flows in the space of probability density equipped with the Wasserstein distance is given in [
39].
The Constrained JKO Schema
The parametrization of the transport plan considered in this paper has been originally proposed in [
30], and is defined as follows:
with
, and
is a rotation matrix consisting of a sequence of matrix multiplications,
, where
is defined as
where
denotes the
th component of the vector-valued function
.
Finally, is a reference versor; for simplicity, we consider , where denotes the all-ones vector and, therefore, .
From the Brenier theorem, we know that
, and so we are posing
. The parameters to be learned in the proposed parametrization are the two functions
and
. Contrary to the ICNN-based parametrizations, but analogously to [
34,
35,
36], the proposed parametrization does not require any assumptions on the two functions to be learned.
According to the proposed parametrization, the JKO schema can be recast into
with
.
Like all the other approaches, we learn the functions underlying our parametrization by accessing to point clouds (i.e., empirical distributions) sampled from probability densities
[
30]. Analogously to other neural networks-based approaches, denote with
the point cloud at iteration
obtained as
with
such that
. According to our parametrized transport
, every point of the current cloud
is transported as follows:
with
and
representing shorthand for
and
.
Since we are working with point clouds, and according to our parametrization, we can rewrite the 2-Wasserstein distance as follows:
Since rotation does not modify the module of the rotated
v, and according to our choice for
v, we have
. Finally, we can write
with
.
This allows us to set a threshold ε on the term to explicitly quantify the Wasserstein ball around the current , instead of weighting with respect to through the value of in the standard JKO schema.
Finally, the constrained-JKO (cJKO) scheme can be formalized:
with
as a small positive quantity,
,
, and
.
In simpler terms, we are searching for , which minimizes within a 2-Wasserstein neighbourhood of .
Indeed, in cJKO operates analogously to in JKO, requiring a longer sequence to converge to , but providing a transport close to the optimal one. However, it is important to clarify that, from a quantitative perspective, they are completely different. Since is just a scalarization weight into a scalarized bi-objective problem, it is quite impossible to establish a suitable value without any prior knowledge about . Furthermore, using a constant value for , which is the common choice, leads to a significantly different relevance of the two objectives over the JKO iteration
The cJKO scheme overcomes these limitations thanks to
: it is chosen in advance and kept fixed along the overall iterative optimization process, independently on the values of
over the iterations.
Figure 1 shows an example of the trajectories from
to
depending on three different values of
. The three presented cases converge to the same value of
(and same final point cloud) but within a different number of cJKO steps: the smaller the value of ε, the larger the number of iterations K, and the closer the final transport to the optimal one (i.e., trajectories are closer to the actual
, because they are more straight and do not overlap).
As far as the application problem considered in this paper is concerned, cJKO is used to compute the second (stochastic) term of the objective function (4) and the chance constraint (5) separately, given a candidate solution for the facility location problem.
For completeness, we report the generic algorithm for the cJKO scheme, as follows. It is important to clarify that it is separately applied to identify the demand distributions—within an -radius Wasserstein ball of the historical demand data —providing (a) the maximum penalization in the objective function, that is, the second term in (4), and (b) the minimum probability in the chance constraint (5).
cJKO Algorithm for distributionally robust optimization
Since the true probability distribution of the demand is unknown, as well as changes due to unpredictable events, it is difficult to define suitable values for and a priori. On the other way round, their values are easy to interpret: larger values increase robustness against events that could lead to demand values that are significantly different from the historical ones, but this means that the final decision might be too conservative. On the contrary, smallest values assume that the future demand should not change too much with respect to historical data, leading to less conservative decisions but difficulty in dealing with possible significant changes. The suggestion is to perform different runs with different values of and , with the aim to observe differences both in term of the final solution and generated demand distributions. Finally, the user can select the most suitable decisions (and scenario) for the specific goals of the target setting.
4.2. Metaheuristic-Based Approach
In this section, we introduce a metaheuristic optimization framework for distributionally robust supply chain design under demand uncertainty. Our approach integrates
A Genetic Algorithm (GA) for the heuristic initialization of facility location, storage levels, and allocation decisions;
The cJKO scheme to compute the stochastic term of the objective function (4) and the chance constraint (5) separately;
A robust evaluation function that balances operational cost and risk-averse decision-making.
This hybrid method provides an efficient alternative to exact methods, particularly in high-dimensional settings, in which classical optimization techniques struggle with combinatorial complexity.
4.2.1. Genetic Algorithm for Heuristic Initialization
In step ①. to enhance the efficiency of the optimization process, we employ a GA for heuristic initialization. A GA is an evolutionary-based meta-heuristic that iteratively improves candidate solutions by mimicking natural selection principles, including selection, crossover, and mutation [
40]. In our approach, the GA generates an initial population of facility location and allocation decisions, ensuring diversity in solutions while providing a high-quality starting point for subsequent optimization. By leveraging a GA, we improve the convergence speed and enhance the feasibility of the optimization model by reducing the likelihood of poor initializations. This approach is particularly beneficial for large-scale problems where a purely random initialization could lead to suboptimal or infeasible solutions.
Chromosome Representation
Every individual in the genetic population represents a possible supply chain configuration as a chromosome, encoding
Facility location decisions ( binary matrix);
Storage level decisions ( continuous matrix);
Allocation decisions continuous matrix).
Formally, a chromosome is structured as chromosome = , where
denotes whether facility j is open at time t;
represents the storage level at facility j at time t;
captures how much demand from the customer i is allocated from facility j at time t.
Fitness Function
The objective function in the genetic algorithm evaluates the total cost of a candidate supply chain configuration with Formula (4). It minimizes the sum of three key components: (i) fixed facility opening costs, (ii) inventory holding costs, and (iii) expected transportation costs and shortage penalties costs under stochastic demand while satisfying demand constraints.
Genetic Operators
The GA employs three main evolutionary operators:
Selection (Tournament Selection);
- ○
Randomly selects k-tournament competitors and chooses the best candidate.
Crossover (Uniform Crossover);
- ○
Swaps facility decisions and allocations between parent solutions to create diverse offspring. In this strategy, each gene (i.e., element of the chromosome representing facility decisions or allocations) is independently chosen from one of the two parent solutions with equal probability. This encourages greater diversity and exploration of the solution space by recombining building blocks from both parents.
Mutation (Storage and Allocation Adjustment);
- ○
Perturbs storage levels and allocations to introduce new feasible solutions. The swap mutation randomly selects two positions in the chromosome and exchanges their values. This perturbation helps the algorithm escape local optima by introducing structural variation into the solution while preserving feasibility.
The GA runs for a predefined number of generations, yielding a near-optimal initial solution.
4.2.2. Wasserstein Distributionally Robustness via cJKO
Once an initial solution is generated in step ②, we use cJKO to compute the stochastic term of the objective function (4) and the chance constraint (5) separately.
Distributional Robustness via Wasserstein Distance. Starting from the point cloud associated with to historical demand, namely, , we iteratively search for to finally compute right term of the chance constraint (5).
cJKO Iterative Update Rule. At each iteration k, we solve the constrained Wasserstein optimization problem in Formulas (4) and (5). The process is repeated until a maximum number of iterations is reached. This ensures that our demand forecasts adapt dynamically based on historical and simulated uncertainty data.
4.2.3. Chance Constraint Verification ③
In supply chain and facility location problems, ensuring demand satisfaction under uncertainty is critical for maintaining service levels and system robustness. Traditional chance constraints provide a probabilistic guarantee that demand will be met with high probability, despite stochastic fluctuations. In our approach, the chance constraint is reformulated within the cJKO variational framework to account for distributional uncertainty. Specifically, we enforce the constraint (5). This formulation ensures that demand satisfaction holds with probability at least under the worst-case distribution within the Wasserstein uncertainty set. Using a cJKO-based gradient flow, we iteratively evolve the distribution to maximize constraint violation while remaining within . If the worst-case distribution still satisfies the demand constraint, robustness is achieved. This approach allows us to dynamically verify and enforce probabilistic demand satisfaction in a data-driven and distributionally robust manner.
4.2.4. Integrated Iterative Optimization Framework
The GA-initialized solution and cJKO-updated demand are combined in a metaheuristic-based iterative optimization loop.
Iterative Algorithm
Step 1: Initialize supply chain decisions using GA.
Step 2: Solve the cJKO Wasserstein update to refine demand scenarios.
Step 3: Evaluate feasibility via chance constraints. Step ④.
Step 4: Adjust storage and allocations based on updated demands.
Step 5: Repeat until convergence or max iterations. Step ⑤.
If the constraint is violated, we adjust the ambiguity radius
and re-optimize. The optimization process follows the iterative procedure illustrated in
Figure 2. This framework begins with the initialization of supply chain decisions using GA, followed by the cJKO Wasserstein update to refine the demand distributions. Chance constraint verification ensures robustness, and necessary adjustments to storage and allocations are made until convergence or the maximum iteration limit is reached.