Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation

Wang, Haonan; Zhao, Yunxiao; Guo, Yixin; Liu, Changhe; Zhang, Xinlin

doi:10.3390/sym17060959

Open AccessArticle

Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation

by

Haonan Wang

,

Yunxiao Zhao

,

Yixin Guo

,

Changhe Liu

^* and

Xinlin Zhang

School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang 471003, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(6), 959; https://doi.org/10.3390/sym17060959

Submission received: 14 April 2025 / Revised: 29 May 2025 / Accepted: 10 June 2025 / Published: 17 June 2025

(This article belongs to the Special Issue Symmetry in Optimal Control and Applications)

Download

Browse Figures

Versions Notes

Abstract

In this study, we begin by extending the mathematical formulation of the expectile risk measure through a key modification: replacing the expectation in its defining equation with expected shortfall. This substitution leads to a revised risk measure that more precisely captures downside risk. To handle the uncertainty of the underlying distribution, we then adopt a distributionally robust optimization framework. Notably, this robust optimization problem can be reformulated as a linear programming problem, and by employing suitable approximation techniques, we derive an analytical solution. In numerical experiments, our portfolio problem exhibits superior performance when compared to several traditional and distributionally robust optimized portfolio problems.

Keywords:

expectile; expected shortfall; optimization under uncertainty; duality theory

MSC:

91G70

1. Introduction

In the context of portfolio optimization, while Markowitz’s [1] mean–variance question provides a theoretical foundation for asset allocation, its limitations have become increasingly evident in practical applications. Variance, as a risk measure, fails to distinguish between different types of risks and overlooks the distribution of risk, which may expose investors to unacceptable levels of loss. Additionally, parameter estimation errors, particularly in the covariance matrix and mean vector, can lead to significant variations in the optimization outcomes, thereby impacting the robustness of the portfolio [2].

To overcome these limitations, new risk measures have been proposed, with the expectile emerging as a prominent tool in both statistics and finance in recent years [3]. Compared to traditional risk measures such as the Value at Risk (VaR) and expected shortfall (ES), the expectile provides a more flexible approach to capturing different degrees of risk aversion by minimizing the weighted mean squared error. By adjusting parameter

α

, the expectile allows for greater emphasis on either the lower or upper tail of the distribution, making it suitable for a variety of investment objectives. This method not only effectively complements traditional risk measures but also offers a more refined and personalized risk control strategy for portfolio optimization.

Formally, the

α

-expectile [3] is defined by

q_{α} (X) = \arg \min_{d \in R} \{α E [{(X - d)}_{+}^{2}] + (1 - α) E [{(X - d)}_{-}^{2}]\},

where

α

is the confidence level,

{(x)}_{+}

denotes the positive part, and

{(x)}_{-}

denotes the negative part. According to [4],

q_{α} (X)

is the unique solution to this optimization problem if and only if

α E [{(X - q_{α} (X))}_{+}] = (1 - α) E [{(X - q_{α} (X))}_{-}] .

When

α = 0.5

, the expectile coincides with the mean of X. As

α

moves above

0.5

,

q_{α} (X)

progressively exceeds the mean, reflecting an increasing focus on higher losses (or lower returns). Conversely, when

α < 0.5

, the expectile shifts below the mean, emphasizing outcomes below the central tendency. These features allow practitioners to adjust

α

according to the severity of risks they wish to emphasize, providing a flexible measure that can better capture tail risks and adapt to changing market conditions. This makes the expectile a valuable tool for tailoring risk management strategies in investment portfolios.

In the application domain, ref. [3] demonstrates that expectiles serve as rigorous and efficient risk assessment tools, characterized by properties such as coherence, elicitability, non-negative risk margins, and adaptability to structured contracts. Building on this foundation, ref. [5] provides closed-form expectile(ERM) formulas via generalized Fourier inversion, validates them across five market models, and proposes a 1.25 ERM-to-VaR scaling for Basel compliance. As coherent measures (with

α \geq 0.5

), expectiles enhance regulatory compliance and facilitate comprehensive risk verification, while optimal reinsurance strategies effectively streamline the trade-off between risk and cost.

Additionally, Taylor’s Conditional Autoregressive Expectiles (CARE) model [6] employs asymmetric least squares to capture autoregressive structures, enabling the dynamic estimation of VaR and ES without imposing restrictive distributional assumptions. Empirical evidence indicates that the CARE model attains accuracy comparable to that of historical simulation and GARCH-EVT approaches—especially in estimating extreme quantiles—thereby overcoming the limitation of Conditional Autoregressive Value-at-Risk methods, which focus solely on VaR, and ultimately offering a more comprehensive framework for financial risk management.

To further enhance the applicability of these tools, this paper introduces a generalization of the expectile concept to encompass expected shortfall at a confidence level

β

, which assesses the anticipated value of losses across varying confidence levels. Moreover, this paper also attempts to use expectile for portfolio decision making. This generalization not only preserves the advantageous properties of expected shortfall but also integrates the flexibility inherent in the expectile, thereby facilitating a more comprehensive risk assessment that accommodates diverse risk preferences and confidence levels.

In this paper, we generalize the expectile, and its generalized form can be expressed as

α {ES}_{β_{1}} ({(X - d)}_{+}) = (1 - α) {ES}_{β_{2}} ({(X - d)}_{-}), d \in R, β_{1}, β_{2} \in (0, 1) .

In this equation,

{ES}_{β_{1}} ({(X - d)}_{+})

represents the expected shortfall at confidence level

β_{2}

for the positive part of the loss

X - d

, while

{ES}_{β} ({(X - d)}_{-})

corresponds to the expected shortfall for the negative part. The solution d to this equation is referred to as the ES-expectile, denoted as

q_{α, β_{1}, β_{2}} (X)

. When

β_{1} = β_{2} = 0

,

q_{α, β_{1}, β_{2}} (X) = q_{α} (X)

.

The expected shortfall (ES) [7] is defined as

{ES}_{β} (X) = \frac{1}{1 - β} \int_{β}^{1} {VaR}_{θ} (X) d θ,

where the Value at Risk (VaR) at confidence level

θ

is mathematically defined as

{VaR}_{θ} (X) = inf {x \in R : P (X \leq x) \geq θ} .

Additionally, for ES, there is the Rockafellar–Uryasev formula [7], which simplifies the computation of the conditional risk measure ES. The value of

{ES}_{τ}

is obtained by solving the following optimization problem:

{ES}_{τ} (Z) = inf_{ζ \in R} (ζ + \frac{1}{1 - τ} E [{(Z - ζ)}_{+}]) .

(1)

This formulation emphasizes the use of different weights for losses above and below a specified threshold, under varying confidence levels, thereby capturing the severity of extreme risk scenarios more effectively than traditional measures such as VaR.

To address this uncertainty, distributionally robust optimization (DRO) has been proposed as an effective modeling technique. The main objective of DRO is to find decisions that minimize the worst-case expected cost over an ambiguity set, which is constructed using the distributional information of uncertain data. Compared to traditional robust optimization, DRO utilizes available distributional information more effectively, making it a more conservative approach in handling uncertainty [8].

In recent years, distributionally robust optimization methods based on Wasserstein ambiguity sets have gained significant attention. To the best of our knowledge, Mohajerin Esfahani and Kuhn [9] first explored the convex reformulation of the Wasserstein distributionally robust optimization question, demonstrating several desirable properties of this approach. Gao subsequently extended these properties and explored the connection between distributionally robust optimization and variation regularization [10,11]. Today, DRO is widely applied in various fields, including machine learning [12] and portfolio optimization [13,14,15,16,17,18].

We summarize our main contributions in this paper as follows:

Expected Shortfall Expectile (ES-expectile). We introduce a new class of risk measures by substituting the expectation in the classical expectile definition with expected shortfall (ES). This modification preserves the coherent risk measure spirit of ES while maintaining the sensitivity of expectiles to the full distribution, yielding a refined metric that captures the entire left-tail risk profile more faithfully than either the ES or expectile alone.
Distributionally Robust ES-expectile via Wasserstein Ambiguity. To counter the instability arising from finite-sample empirical distributions, we formulate the ES-expectile portfolio problem within a distributionally robust optimization framework whose ambiguity set is a Wasserstein ball centered at the empirical distribution. We prove that the resulting min–max problem can be conservatively reformulated as a computationally tractable program that can be solved reliably with standard optimization software.
Empirical Superiority and Extreme-Weight Adjustment. Out-of-sample experiments show that our distributionally robust ES-expectile question (DRES-expectile) delivers higher Sharpe ratios and Sortino ratios compared with the classical ES-expectile portfolio question, traditional portfolio approaches, and Wasserstein DRO portfolios. Although the DRES-expectile occasionally yields extreme allocations, we mitigate this behaviour with a lightweight extreme-weight adjustment heuristic that retains most of the performance gains while reducing variance.

The remainder of this paper is organized as follows. In Section 2, the definition of the ES-expectile is presented, and its favorable properties are demonstrated through a series of lemmas and theorems, culminating in the introduction of the ES-expectile question. Section 3 begins with an introduction to the theory of distributionally robust optimization along with related theorems. Based on this foundation, the DRES-expectile is derived, and its degenerate form is also presented. Section 4 further simplifies the DRES-expectile question, and Section 5 presents numerical experiments that highlight the unique features of our question and provide guidance for portfolio adjustment decisions. The results indicate that our question outperforms both traditional portfolio questions and those based on distributionally robust optimization. Section 6 provides a dedicated discussion of the managerial and investor implications of the proposed DRES-expectile framework and translates the empirical evidence into actionable guidance for future portfolio optimization practice.

For clarity and ease of reference, Table 1 lists the key symbols and abbreviations employed in this paper.

Notations: We denote by

δ_{{\hat{ξ}}_{i}}

the Dirac measure concentrated at

{\hat{ξ}}_{i} \in R^{n}

, which assigns a unit mass to point

{\hat{ξ}}_{i}

. The empirical distribution

P_{N}

is then defined as the probability measure given by the average

P_{N} = \frac{1}{N} \sum_{i = 1}^{N} δ_{{\hat{ξ}}_{i}},

where

{\hat{ξ}}_{1}, {\hat{ξ}}_{2}, \dots, {\hat{ξ}}_{N}

are the observed samples. We denote by

e \in R^{n}

the vector where each component is 1. We denote by

I_{A} (x)

the indicator function, which assigns a value of 1 if

x \in A

, and 0 otherwise. We denote by

M (Ξ)

the set of all Borel probability measures supported on

Ξ

, where

Ξ \subseteq R^{n}

. Let

(Ω, B, P)

represent an atomless probability space, where

Ω

is the set of possible states,

B

is a

σ

-algebra on

Ω

, and

P

is the probability measure. The space

L^{1} = L^{1} (Ω, B, P)

refers to the set of measurable functions f for which

\int_{Ω} | f | d P < \infty

.

2. ES-Expectile

In this section, we will define the ES-expectile question and derive some necessary theorems and formulas.

Definition 1.

For

X \in L^{1}

,

α \in (0, 1)

, and

β_{1}, β_{2} \in (0, 1)

, the ES-expectile of X, denoted by

q_{α, β_{1}, β_{2}} (X)

, is defined as the solution to the equation

α {ES}_{β_{1}} ({(X - d)}_{+}) = (1 - α) {ES}_{β_{2}} ({(X - d)}_{-}), d \in R .

In the special case where

β_{1} = β_{2} = 0

,

q_{α, β_{1}, β_{2}} (X)

reduces to the standard expectile

q_{α} (X)

.

Lemma 1.

For

X \in L^{1}

,

d \in R

, and

β \in (0, 1)

, the expected shortfall

{ES}_{β} {(X - d)}_{+}

and

{ES}_{β} {(X - d)}_{-}

can be expanded as follows

{ES}_{β} ({(X - d)}_{+}) = \{\begin{matrix} {ES}_{β} (X) - d, & if d \leq {VaR}_{β} (X), \\ \frac{1}{1 - β} E [{(X - d)}_{+}], & if d > {VaR}_{β} (X), \end{matrix}

{ES}_{β} ({(X - d)}_{-}) = \{\begin{matrix} d + \frac{β}{1 - β} {ES}_{1 - β} (X) - \frac{1}{1 - β} E (X), & if d \geq {VaR}_{1 - β} (X), \\ \frac{1}{1 - β} E [{(X - d)}_{-}], & if d < {VaR}_{1 - β} (X) . \end{matrix}

Proof.

Note that

{VaR}_{θ} ({(X - d)}_{+}) = {({VaR}_{θ} (X) - d)}_{+}

(We begin by considering the right-hand side of the equation. When

d \geq {VaR}_{θ} (X)

,

{({VaR}_{θ} (X) - d)}_{+}

equals 0. In this case, the left-hand side of the equation is also 0 because

P (max (X - d, 0) \leq 0) \geq α

and

x \geq 0

. When

d < {VaR}_{θ} (X)

, the condition

P (max (X - d, 0) \leq x) \geq α

is equivalent to

P (X \leq x + d) \geq α

, leading to the conclusion that

{VaR}_{θ} ({(X - d)}_{+}) = {VaR}_{θ} (X) - d

.).

If

d \geq {VaR}_{β} (X)

, then

{VaR}_{θ} ({(X - d)}_{+}) = 0

for

0 \leq θ \leq β

. Therefore,

\begin{matrix} {ES}_{β} ({(X - d)}_{+}) & = \frac{1}{1 - β} \int_{0}^{1} {VaR}_{θ} ({(X - d)}_{+}) d θ \\ = \frac{1}{1 - β} E [{(X - d)}_{+}] . \end{matrix}

Conversely, for

d \leq {VaR}_{β} (X)

, we have

\begin{matrix} {ES}_{β} ({(X - d)}_{+}) & = \frac{1}{1 - β} \int_{β}^{1} ({VaR}_{θ} (X) - d) d θ \\ = {ES}_{β} (X) - d . \end{matrix}

Note that

{VaR}_{θ} (- {(X - d)}_{-}) = {VaR}_{θ} ({(X - d)}_{+}) = {({VaR}_{θ} (X) - d)}_{+} = - {({VaR}_{θ} (X) - d)}_{-}

and

{VaR}_{θ} (X) = - {VaR}_{1 - θ} (- X)

. Then,

\begin{matrix} {ES}_{β} ({(X - d)}_{-}) & = \frac{1}{1 - β} \int_{β}^{1} - {VaR}_{1 - θ} (- {(X - d)}_{-}) d θ \\ = \frac{- 1}{1 - β} \int_{0}^{1 - β} {({VaR}_{θ} (X) - d)}_{-} d θ . \end{matrix}

If

d \leq {VaR}_{1 - β} (X)

, then

{({VaR}_{θ} (X) - d)}_{-} = 0

for

1 - β \leq θ \leq 1

, which implies

{ES}_{β} ({(X - d)}_{-}) = \frac{1}{1 - β} E [{(X - d)}_{-}] .

On the other hand, for

d \geq {VaR}_{1 - β} (X)

, we have

\begin{matrix} {ES}_{β} ({(X - d)}_{-}) & = \frac{1}{1 - β} \int_{0}^{1 - β} (d - {VaR}_{θ} (X)) d θ \\ = d + \frac{β}{1 - β} {ES}_{1 - β} (X) - \frac{1}{1 - β} E (X) . \end{matrix}

□

In the subsequent sections, we will refer to Lemma 1 to establish that our definition is well defined by demonstrating that the equation in Definition 1 admits at least one solution.

Theorem 1.

For

X \in L^{1}

,

α \in (0, 1)

, and

β_{1}, β_{2} \in (0, 1)

, the equation with respect to d,

α {ES}_{β_{1}} ({(X - d)}_{+}) = (1 - α) {ES}_{β_{2}} ({(X - d)}_{-}), d \in R,

is guaranteed to have at least one solution.

Proof.

We first define the function

f (d) = α {ES}_{β_{1}} ({(X - d)}_{+}) - (1 - α) {ES}_{β_{2}} ({(X - d)}_{-}) .

It is evident that

f (d)

is strictly decreasing with respect to d. Also, as d approaches negative infinity,

f (d)

tends to positive infinity, while as d approaches positive infinity,

f (d)

tends to negative infinity. Therefore, it suffices to prove that

f (d)

is continuous in order to apply the Intermediate Value Theorem, which guarantees the existence and uniqueness of a zero point for

f (d)

.

To prove the continuity of

f (d)

, it suffices to show that

{ES}_{β_{1}} ({(X - d)}_{+})

is continuous. From Lemma 1, we know that

{ES}_{β_{1}} ({(X - d)}_{+})

is continuous at

{VaR}_{β_{1}} (X)

. Therefore, we only need to further demonstrate the continuity of

E [{(X - d)}_{+}]

.

We define

g (d) = E [max (X - d, 0)] = \int_{ω \in Ω} max (X (ω) - d, 0) d P (ω) .

For all

ε > 0, \exists δ = ε > 0 such that if | d - d_{0} | < δ

, then

|g (d) - g (d_{0})| \leq \int_{Ω} |max (X (ω) - d, 0) - max (X (ω) - d_{0}, 0)| d P (ω) \leq \int_{Ω} |d - d_{0}| d P (ω) < ε .

Thus,

g (d)

is continuous, which further implies that

f (d)

has at least one zero point. □

Therefore, our classical portfolio question is

\begin{matrix} \inf_{d \in R, y \in R^{n}} & d \\ s . t . & α {ES}_{β_{1}} ({(y^{T} ξ - d)}_{+}) = (1 - α) {ES}_{β_{2}} ({(y^{T} ξ - d)}_{-}), \\ y_{i} \geq 0, i = 1, \dots, n, \\ \sum_{i = 1}^{n} y_{i} = 1 . \end{matrix}

(2)

Here, d represents the risk measure, y denotes the portfolio weights, and

ξ

is the vector of investment returns, which follows an unknown distribution.

The formulas for

{ES}_{β} ({(X - d)}_{+})

and

{ES}_{β} ({(X - d)}_{-})

in Lemma 1 were derived to obtain simpler forms. However, this derivation introduces piecewise cases depending on the relationship between d and

{VaR}_{β} (X)

or

{VaR}_{1 - β} (X)

. To address the complexity caused by these piecewise cases, we make the following assumption to simplify the analysis and streamline the portfolio problem.

Assumption 1.

Due to the variability of y, which is allowed to vary within a specific domain, we assume that solution d lies within the range from

{V a R}_{β} (y^{T} ξ)

to positive infinity. This assumption ensures that there exists a risk measure greater than or equal to

{V a R}_{β} (y^{T} ξ)

, accommodating the potential changes in y.

Specifically, when

β_{1} = β_{2} = β \geq 0.5

and considering the expectation of the returns

ρ

, portfolio question (2) is equivalent to the following question:

\begin{matrix} \inf_{d \in R, y \in R^{n}} & d \\ s . t . & d \geq {VaR}_{β} (y^{T} ξ), \\ \frac{α}{1 - α} E [{(y^{T} ξ - d)}_{+}] = (1 - β) d + β {ES}_{1 - β} (y^{T} ξ) - E (y^{T} ξ), \\ y_{i} \geq 0, i = 1, \dots, n, \\ \sum_{i = 1}^{n} y_{i} = 1, \\ E [y^{T} ξ] = ρ . \end{matrix}

(3)

We know that the optimal

ζ

in the Rockafellar–Uryasev Formula (1) is actually the

τ

-quantile of the random variable Z. Therefore, if we use the Rockafellar–Uryasev formula to replace

{ES}_{1 - β} (y^{T} ξ)

in question (3), we can obtain the following:

d \geq ζ .

(4)

This problem framework is the focus of our study. Since we have discussed the two challenging terms

E [{(y^{T} ξ - d)}_{+}]

and

{ES}_{1 - β} (y^{T} ξ)

, the other cases can be easily generalized. Moreover, the derivation process does not rely on the requirement that

β_{1} = β_{2}

. This is also the reason why we approximate

{ES}_{β} ({(X - d)}_{+})

and

{ES}_{β} ({(X - d)}_{-})

using only these two terms.

However, the problem framework has certain challenges, such as the unknown probability measure P of

ξ

, which corresponds to the distribution

Law (ξ)

, and the computational difficulties associated with modeling VaR and ES. These issues arise from the uncertainty in

Law (ξ)

and the complexity of handling these risk measures in practice.

3. DRO Theory and DRES-Expectile

In this section, we briefly introduce the theory of distributionally robust optimization and apply it using some derived related theorems to handle question (3).

In decision making under uncertainty, it is common to encounter situations where the probability distribution governing the uncertainty is not known precisely. Classical stochastic programming assumes that this distribution is fully known, but this assumption often does not hold in real-world scenarios. This leads to the concept of distributionally robust optimization (DRO), which seeks to hedge against the worst-case scenario by considering a family of probability distributions that are close to a nominal distribution.

One powerful tool to define this “closeness” between probability distributions is the Wasserstein distance. The Wasserstein distance, rooted in the theory of optimal transport, provides a metric with which to quantify the distance between two probability distributions on a given space. Specifically, if we consider an uncertainty set

Ξ

equipped with a norm

∥ \cdot ∥

, the set of all Borel probability distributions on

Ξ

with finite p-th moments is denoted as

P_{p} (Ξ)

. For any two distributions

μ, ν \in P_{p} (Ξ)

, the Wasserstein distance of order p is defined as

W_{p} (μ, ν) : = {(min_{γ \in Γ (μ, ν)} \int_{Ξ \times Ξ} {∥ ξ - η ∥}^{p} γ (d ξ, d η))}^{1 / p},

where

Γ (μ, ν)

represents the set of all couplings between

μ

and

ν

, that is, the set of all joint distributions on

Ξ \times Ξ

that have

μ

and

ν

as marginals.

In the context of distributionally robust optimization, we use the Wasserstein distance to construct an ambiguity set

M

that contains all distributions within a certain Wasserstein distance from a nominal distribution

ν

. The goal is then to find a decision that performs optimally in the worst-case scenario over all distributions in this ambiguity set. Mathematically, this can be formulated as

min_{y \in Y} sup_{μ \in M} E_{μ} [Ψ (y, ξ)],

where

M = {μ \in P_{p} (Ξ) : W_{p} (μ, ν) \leq δ}

, with

δ > 0

denoting the radius of the Wasserstein distance. Here,

P_{p} (Ξ)

represents the set of probability distributions with finite p-th moments, and

Ψ

is an upper semi-continuous function, typically representing a loss or a negative payoff function. The random variable

ξ

represents uncertainty, and

W_{p} (μ, ν)

denotes the Wasserstein distance between

μ

and

ν

. Since our main focus in the following discussion is on the problem of taking the supremum over distributions, specifically

{sup}_{μ \in M} E_{μ} [Ψ (y, ξ)]

, we treat y as a parameter. For convenience, we often write

Ψ (y, ξ)

as

Ψ_{y} (ξ)

, which indicates that it is a function of the random variable

ξ

.

We refer to the ambiguity set

M

as the Wasserstein ball and use it with

p = 1

in this study due to the favorable mathematical properties of the Wasserstein metric with

p = 1

, which will be utilized in the subsequent derivations.

This formulation provides a robust approach to decision making under uncertainty, ensuring that the chosen decision is resilient against the worst possible distributional deviations within the specified Wasserstein distance [10].

In recent studies on distributionally robust optimization, Gao et al. ([11] Corollary 2), investigated the relationship between distributionally robust optimization and variation regularization, obtaining the following conclusion.

Theorem 2.

Assume that

Ψ_{y}

is Lipschitz continuous. Suppose

diam (Ξ) = \infty

and there exists

ξ_{0} \in Ξ

such that

\begin{matrix} \underset{∥ \tilde{ξ} - ξ_{0} ∥ \to \infty}{lim sup} \frac{Ψ_{y} (\tilde{ξ}) - Ψ_{y} (ξ_{0})}{∥ \tilde{ξ} - ξ_{0} ∥} = {∥ Ψ_{y} ∥}_{Lip} . \end{matrix}

(5)

Then, for any

δ \geq 0

and

Ψ_{y}

, we have

sup_{μ \in M} E_{ξ \sim μ} [Ψ_{y} (ξ)] = E_{ξ \sim P_{N}} [Ψ_{y} (ξ)] + δ {∥ Ψ_{y} ∥}_{Lip},

where

P_{N}

is the empirical distribution, δ is the radius of the Wasserstein ball, and

∥ Ψ_{y} ∥_{Lip}

is called Lipschitz regularization.

Specifically, when condition (5) is satisfied for a particular point

ξ_{0} \in Ξ

, it is also satisfied for all

ξ \in Ξ

. Condition (5) implies that the Lipschitz norm is approximately achieved between any

ξ \in supp P_{N}

and some distant point

\tilde{ξ}

. More precisely, for any

ε > 0

and

r > 0

, there exists a point

\tilde{ξ} = T_{Ψ} (ξ)

(

T_{Ψ} (ξ)

is a mapping that transforms point ξ to a distant point, ensuring that the Lipschitz continuity condition is satisfied. Specifically, it ensures that for

Ψ_{y}

, as point ξ moves farther away, the increment in

Ψ_{y}

approximately reaches the upper bound of the Lipschitz constant

∥ Ψ_{y} ∥_{Lip}

.) such that

∥ T_{Ψ} (ξ) - ξ ∥ > r

and

Ψ_{y} (T_{Ψ} (ξ)) - Ψ_{y} (ξ) \geq ({∥ Ψ ∥}_{Lip} - ε) ∥ T_{Ψ} (ξ) - ξ ∥

. The approximately worst-case distribution then perturbs some point

ξ_{i_{0}} \in supp P_{N}

to

T_{Ψ} (ξ_{i_{0}})

with a small probability of

δ / N

, resulting in the limiting distribution

\frac{1}{N} \sum_{i \neq i_{0}} δ_{ξ_{i}} + \frac{1 - δ}{N} δ_{ξ_{i_{0}}} + \frac{δ}{N} δ_{T_{Ψ} (ξ_{i_{0}})} .

It can be observed that we can simplify a problem involving the supremum over distributions, but determining the optimal solution, i.e., identifying the worst-case distribution, remains challenging. Moreover, as shown in ([9] Example 2), the worst-case distribution does not always exist. Although ([10] Theorem 1) and ([9] Corollary 4.6) mention that a worst-case distribution exists under certain conditions, sufficient and necessary conditions are still lacking.

We now reproduce Corollary 4.6 from Kun et al. ([9] Corollary 4.6).

Theorem 3.

An uncertainty set

Ξ \subseteq R^{n}

is convex and closed, with some lower semi-continuous, convex, and appropriate functions (Given a real-valued function f and a non-empty set X, if there exists an

x \in X

such that

f (x) < + \infty

and for all

x \in X

,

f (x) > - \infty

, then function f is said to be appropriate on set X.)

- l_{k}

, where

k \leq K

. Moreover, for any

k \leq K

, the function

l_{k}

does not take the value negative infinity on Ξ. If Ξ is compact or

K = 1

, then the worst-case distribution of

max_{k \leq K} l_{k}

exists.

Next, we apply distributionally robust optimization to

q_{α, β_{1}, β_{2}} (X)

and the expected return, aiming to derive

q_{α, β_{1}, β_{2}} (X)

under the worst-case distribution while considering the expected return. Furthermore, we handle

{ES}_{1 - β} (y^{T} ξ)

using the Rockafellar–Uryasev Formula (1). This leads to the following problem:

\begin{matrix} sup_{μ \in W} inf_{ζ \in R} & \{(1 - β) d + β ζ + E_{μ} [{(y^{T} ξ - ζ)}_{+}] - E_{μ} (y^{T} ξ) - \frac{α}{1 - α} E_{μ} [{(y^{T} ξ - d)}_{+}]\} \\ s . t . & E_{μ} [y^{T} ξ] = ρ, \end{matrix}

(6)

where

W = {μ \in P_{p} (Ξ) : W_{p} (μ, P_{N}) \leq δ}

.

Define

f (μ, ζ) = β ζ + E_{μ} [{(y^{T} ξ - ζ)}_{+}] - E_{μ} (y^{T} ξ) - \frac{α}{1 - α} E_{μ} [{(y^{T} ξ - d)}_{+}] = E_{μ} [h (ζ, z)]

, where

z = y^{T} ξ

, and

h (ζ, z)

can be regarded as a loss function. Here, we take the upper bound of the expectation of

h (ζ, z)

over the distribution rather than

E_{μ} (- h (ζ, z))

because, as discussed in the relevant distributionally robust optimization literature [9], when the objective function is written in the form of Theorem 3, i.e.,

max_{k \leq K} l_{k}

, where

l_{k}

is a concave function, the problem exhibits some desirable mathematical properties. As for the specific equivalent representation of

h (ζ, z)

, we will provide it in the following sections.

Let us first consider

{inf}_{ζ \in R} {sup}_{μ \in W} f (μ, ζ)

. Under Assumption 1, we can easily deduce that for any

ζ

, the function

h (ζ, ξ)

has the same Lipschitz regularization (If there exist

ζ_{1}

and

ζ_{2}

such that

d < ζ_{1}

and

d > ζ_{2}

, the Lipschitz regularizations of the two loss functions

h (ζ_{1}, ξ)

and

h (ζ_{2}, ξ)

may differ. Our assumption excludes this case.). Thus, according to Theorem 2, the worst-case distribution for the problem

{sup}_{μ \in W} E_{μ} [h (ζ, ξ)]

asymptotically converges to

\frac{1}{N} \sum_{i \neq i_{0}} δ_{ξ_{i}} + \frac{1 - δ}{N} δ_{ξ_{i_{0}}} + \frac{δ}{N} δ_{T_{h} (ζ, ξ_{i_{0}})} .

Moreover, we know that for different

ζ

, the variation in the loss function, as described by

h (T_{h} (ζ, ξ_{i_{0}}), ξ_{i_{0}}) - h (ζ, ξ_{i_{0}})

in Theorem 2, is the same. Therefore, it can be concluded that for different loss functions

h (ζ, ξ)

, the worst-case distributions to which they asymptotically converge can be the same.

Assumption 2.

For a sequence of loss functions with respect to ζ, defined as

h (ζ, z) = {(z - ζ)}_{+} - z - \frac{α}{1 - α} {(z - d)}_{+}

, it can be written as a function that conforms to Theorem 3 in the form of

max (q_{1}, q_{2})

, where

q_{1} (z) = min (- z, \frac{1 - α + α^{2}}{α - 1} z + \frac{α^{2}}{1 - α} ζ)

,

q_{2} (z) = min (- ζ, \frac{α}{α - 1} z + \frac{α}{1 - α} d - ζ)

, and both

q_{1}

and

q_{2}

are concave functions. If z belongs to a compact uncertainty set Ξ, then on the Wasserstein ball

W

, centered at the empirical distribution with radius

δ > 0

, the worst-case distributions exist and are the same.

The sources of

q_{1}

and

q_{2}

in the above assumptions are as follows. To ensure that

h (ζ, z) = max (q_{1}, q_{2})

, in the part of

q_{1} (z)

,

q_{2} (z)

is not the only choice. The key is to find a linear function passing through

(ζ, - ζ)

with a slope less than −1 and less than

\frac{α}{α - 1}

. We choose the slope as

\frac{α}{α - 1} + α - 1

, and it is easy to verify that it satisfies the conditions. By adding some further conditions, we can use Theorem 3 to ensure that the worst-case distribution exists.

Theorem 4.

For the given Wasserstein ball

W

with

δ > 0

,

β \in (0, 1)

, and

α \in (0, 1)

, under the condition that inequality (4) and Assumption 2 hold, define

f (μ, ζ) = β ζ + E_{μ} [{(y^{T} ξ - ζ)}_{+}] - E_{μ} (y^{T} ξ) - \frac{α}{1 - α} E_{μ} [{(y^{T} ξ - d)}_{+}]

, where

y^{T} ξ

comes from a compact uncertainty set Ξ. Then, the following equality holds:

sup_{μ \in W} inf_{ζ \in R} f (μ, ζ) = inf_{ζ \in R} sup_{μ \in W} f (μ, ζ) .

Proof.

From inequality (4) and Assumption 2, we know that there exists a probability measure

μ^{*}

, independent of ζ, such that

inf_{ζ \in R} sup_{μ \in W} f (μ, ζ) = inf_{ζ \in R} f (μ^{*}, ζ) = inf_{ζ \in R} f (P_{N}, ζ) + {∥ f ∥}_{Lip} .

We know that the max–min inequality holds independently of the properties of f or whether

W

is a convex set (For any fixed

μ_{0}

, we have

sup_{μ \in W} f (μ, ζ) \geq f (μ_{0}, ζ)

. Taking the infimum over ζ on both sides and then the supremum over

μ_{0}

yields the result.), so we always have

sup_{μ \in W} inf_{ζ \in R} f (μ, ζ) \leq inf_{ζ \in R} sup_{μ \in W} f (μ, ζ) .

For any fixed

μ_{0}

, we also have

sup_{μ \in W} inf_{ζ \in R} f (μ, ζ) \geq inf_{ζ \in R} f (μ_{0}, ζ) .

Let

μ_{0} = μ^{*}

. Then,

sup_{μ \in W} inf_{ζ \in R} f (μ, ζ) \geq inf_{ζ \in R} f (μ^{*}, ζ) = inf_{ζ \in R} sup_{μ \in W} f (μ, ζ) .

Thus, we have completed the proof. □

Assume

Ξ

is a compact set. By applying Theorem 4, we can transform problem (6), and for now, we do not need to consider minimizing over

ζ

.

\{\begin{matrix} sup_{μ \in W} \int_{Ξ} h (ζ, y^{T} ξ) μ (d ξ) \\ s . t . \int_{Ξ} y^{T} ξ μ (d ξ) = ρ \end{matrix} = \{\begin{matrix} sup_{γ \in M (Ξ)} \int_{Ξ} h (ζ, y^{T} ξ) γ (d ξ, Ξ) \\ s . t . \int_{Ξ} y^{T} ξ γ (d ξ, Ξ) = ρ, \\ \int_{Ξ_{2}} ∥ ξ - η ∥ γ (d ξ, d η) \leq δ, \\ γ (Ξ, d η) = P_{N} (d η) . \end{matrix}

(7)

For

γ (Ξ, d η)

, we have

γ (d ξ, Ξ) = \int_{η \in Ξ} γ (d ξ, d η) = \sum_{i = 1}^{N} γ (d ξ | η = {\hat{ξ}}_{i}) P_{N} ({\hat{ξ}}_{i}) = \frac{1}{N} \sum_{i = 1}^{N} P^{i} (d ξ),

where

P^{i} (d ξ)

represents the conditional distribution of

ξ

conditioned on

η = {\hat{ξ}}_{i}

. Likewise, we have

γ (d ξ, d η) = γ (d ξ | η) P_{N} (η) = \frac{1}{N} \sum_{i = 1}^{N} δ_{{\hat{ξ}}_{i}} (η) P^{i} (d ξ) .

Thus, we can transform problem (7) into

\{\begin{matrix} sup_{P^{i} \in M (Ξ)} \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} h (ζ, y^{T} ξ) P^{i} (d ξ) \\ s . t . \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} y^{T} ξ P^{i} (d ξ) = ρ, \\ \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} ∥ ξ - {\hat{ξ}}_{i} ∥ P^{i} (d ξ) \leq δ, \\ \int_{Ξ} P^{i} (d ξ) = 1, i = 1, \dots, N . \end{matrix}

(8)

If we derive the Lagrangian dual problem from the original problem, then according to an extended version of the well-known strong duality result for moment problems ([19] Proposition 3.4), we know that the maximum and minimum of the Lagrangian dual function remain unchanged even after exchanging the order of optimization.

Theorem 5.

We stipulate that the norm

∥ \cdot ∥

in the Wasserstein distance is the infinity norm

{∥ \cdot ∥}_{\infty}

. Problem (8) is equivalent to

\{\begin{matrix} min_{λ_{1}, λ_{2}, θ_{1}, θ_{2}} λ_{1} ρ + λ_{2} δ + \frac{1}{N} \sum_{i = 1}^{N} max (\frac{ζ α^{2} (1 - θ_{1})}{1 - α} - [θ_{1} + \frac{1 - α + α^{2}}{1 - α} (1 - θ_{1})] y^{⊤} {\hat{ξ}}_{i}, \\ \frac{d α (1 - θ_{2})}{1 - α} - ζ - \frac{α}{1 - α} (1 - θ_{2}) y^{⊤} {\hat{ξ}}_{i}) \\ s . t . | (1 - α) λ_{1} + θ_{1} α^{2} + α^{2} + α + 1 | \leq λ_{2} (1 - α), \\ | λ_{1} + α (1 - λ_{1} - θ_{2}) | \leq λ_{2} (1 - α) \\ λ_{2} \geq 0, \\ θ_{1}, θ_{2} \in (0, 1) . \end{matrix}

Proof.

We derive the Lagrangian dual problem from the original problem (8)

L (P^{i}; λ_{1}, λ_{2}, s) = \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} h (ζ, y^{T} ξ) P^{i} (d ξ)

L (P^{i}; λ_{1}, λ_{2}, s) = \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} h (ζ, y^{T} ξ) P^{i} (d ξ) + λ_{1} (α - \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} y^{T} ξ P^{i} (d ξ))

+ λ_{2} (ε - \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} ∥ ξ - {\hat{ξ}}_{i} ∥_{\infty} P^{i} (d ξ)) + \sum_{i = 1}^{N} s_{i} (1 - \int_{Ξ} P^{i} (d ξ))

= \frac{1}{N} \sum_{i = 1}^{N} \int_{Ξ} (h (ζ, y^{T} ξ) - λ_{1} y^{T} ξ - λ_{2} {∥ ξ - {\hat{ξ}}_{i} ∥}_{\infty} - N s_{i}) P^{i} (d ξ)

+ λ_{1} α + λ_{2} δ + \sum_{i = 1}^{N} s_{i},

where

λ_{1} \in R, λ_{2} \in R_{+},

and

s \in R^{N}

. Thus, the Lagrangian dual of problem (8) is (This is because

M (Ξ)

contains all Dirac distributions, allowing the integral function to attain its supremum at

P^{i}

).

min_{P^{i}; λ_{1}, λ_{2}, s} sup_{ξ \in Ξ} L (ξ; λ_{1}, λ_{2}, s),

which is given by

\{\begin{matrix} min_{λ_{1}, λ_{2}, s, t} λ_{1} ρ + λ_{2} δ + \sum_{i = 1}^{N} s_{i} + \frac{1}{N} \sum_{i = 1}^{N} t_{i}, \\ s . t . h (ζ, y^{T} ξ) - λ_{1} y^{⊤} ξ - λ_{2} {∥ ξ - {\hat{ξ}}_{i} ∥}_{\infty} - N s_{i} \leq t_{i}, \forall ξ \in Ξ, i = 1, \dots, N, \\ λ_{2} \geq 0 . \end{matrix}

(9)

We use the form

h (ζ, y^{⊤} ξ)

as given in Assumption 2, specifically

h (ζ, y^{⊤} ξ) = max (q_{1} (y^{⊤} ξ), q_{2} (y^{⊤} ξ)),

where

q_{1} (y^{⊤} ξ) = min (- y^{⊤} ξ, \frac{1 - α + α^{2}}{α - 1} y^{⊤} ξ + \frac{α^{2}}{1 - α} ζ)

and

q_{2} (y^{⊤} ξ) = min (- ζ, \frac{α}{α - 1} y^{⊤} ξ + \frac{α}{1 - α} d - ζ) .

So, problem (9) is equivalent to

\{\begin{matrix} min_{λ_{1}, λ_{2}, s, t} λ_{1} ρ + λ_{2} δ + \sum_{i = 1}^{N} s_{i} + \frac{1}{N} \sum_{i = 1}^{N} t_{i} \\ s . t . N s_{i} \geq max_{ξ \in Ξ} (- λ_{2} {∥ ξ - {\hat{ξ}}_{i} ∥}_{\infty} - λ_{1} y^{⊤} ξ + q_{1} (y^{⊤} ξ)) - t_{i}, i = 1, \dots, N, \\ N s_{i} \geq max_{ξ \in Ξ} (- λ_{2} {∥ ξ - {\hat{ξ}}_{i} ∥}_{\infty} - λ_{1} y^{⊤} ξ + q_{2} (y^{⊤} ξ)) - t_{i}, i = 1, \dots, N, \\ λ_{2} \geq 0 . \end{matrix}

= \{\begin{matrix} min_{λ_{1}, λ_{2}, s} & λ_{1} ρ + λ_{2} δ + \sum_{i = 1}^{N} s_{i} + \frac{1}{N} \sum_{i = 1}^{N} t_{i} \\ s . t . & N s_{i} \geq max_{ξ \in Ξ} min_{∥ z_{i 1} ∥_{1} \leq λ_{2}} (- z_{i 1}^{⊤} (ξ - {\hat{ξ}}_{i}) - λ_{1} y^{⊤} ξ + q_{2} (y^{⊤} ξ)) - t_{i}, i = 1, \dots, N, \\ N s_{i} \geq max_{ξ \in Ξ} min_{∥ z_{i 1} ∥_{1} \leq λ_{2}} (- z_{i 2}^{⊤} (ξ - {\hat{ξ}}_{i}) - λ_{1} y^{⊤} ξ + q_{2} (y^{⊤} ξ)) - t_{i}, i = 1, \dots, N, \\ λ_{2} \geq 0 . \end{matrix}

= \{\begin{matrix} min_{λ_{1}, λ_{2}, s} & λ_{1} ρ + λ_{2} δ + \sum_{i = 1}^{N} s_{i} + \frac{1}{N} \sum_{i = 1}^{N} t_{i} \\ s . t . & N s_{i} \geq max_{ξ \in Ξ} (- z_{i 1}^{⊤} (ξ - {\hat{ξ}}_{i}) - λ_{1} y^{⊤} ξ + q_{1} (y^{⊤} ξ)) - t_{i}, i = 1, \dots, N, \\ N s_{i} \geq max_{ξ \in Ξ} (- z_{i 2}^{⊤} (ξ - {\hat{ξ}}_{i}) - λ_{1} y^{⊤} ξ + q_{2} (y^{⊤} ξ)) - t_{i}, i = 1, \dots, N, \\ ∥ z_{i 1} ∥_{1} \leq λ_{2}, {∥ z_{i 2} ∥}_{1} \leq λ_{2} . \end{matrix}

= \{\begin{matrix} min_{λ_{1}, λ_{2}, s} & λ_{1} ρ + λ_{2} δ + \sum_{i = 1}^{N} s_{i} + \frac{1}{N} \sum_{i = 1}^{N} t_{i} \\ s . t . & N s_{i} \geq max_{ξ \in Ξ} min (- {(z_{i 1} + (1 + λ_{1}) y)}^{⊤} ξ + z_{i 1}^{⊤} {\hat{ξ}}_{i}, \\ - {(z_{i 1} + (\frac{1 - α + α^{2}}{1 - α} + λ_{1}) y)}^{⊤} ξ + \frac{α^{2}}{1 - α} ζ + z_{i 1}^{⊤} {\hat{ξ}}_{i}) - t_{i}, i = 1, \dots, N, \\ N s_{i} \geq max_{ξ \in Ξ} min (- {(z_{i 2} + λ_{1} y)}^{⊤} ξ - ζ + z_{i 2}^{⊤} {\hat{ξ}}_{i}, \\ - {(z_{i 2} + (\frac{α}{1 - α} + λ_{1}) y)}^{⊤} ξ + \frac{α}{1 - α} d - ζ + z_{i 2}^{⊤} {\hat{ξ}}_{i}) - t_{i}, i = 1, \dots, N, \\ ∥ z_{i 1} ∥_{1} \leq λ_{2}, {∥ z_{i 2} ∥}_{1} \leq λ_{2} \\ λ_{2} \geq 0 . \end{matrix}

(10)

Let

q_{1}^{'} (ξ) =

min (- {(z_{i 1} + (1 + λ_{1}) y)}^{⊤} ξ + z_{i 1}^{⊤} {\hat{ξ}}_{i}, - {(z_{i 1} + (\frac{1 - α + α^{2}}{1 - α} + λ_{1}) y)}^{⊤} ξ + \frac{α^{2}}{1 - α} ζ + z_{i 1}^{⊤} {\hat{ξ}}_{i})

and

q_{2}^{'} (ξ) =

min (- {(z_{i 2} + λ_{1} y)}^{⊤} ξ - ζ + z_{i 2}^{⊤} {\hat{ξ}}_{i}, - {(z_{i 2} + (\frac{α}{1 - α} + λ_{1}) y)}^{⊤} ξ + \frac{α}{1 - α} d - ζ + z_{i 2}^{⊤} {\hat{ξ}}_{i}) .

In order for

q_{1}^{'} (ξ)

and

q_{2}^{'} (ξ)

to have a maximum value, there must exist

θ_{1}, θ_{2} \in (0, 1)

such that (We consider a function

f (x) = min (a^{⊤} x + c, b^{⊤} x + d)

, where

a \neq b

. The two components of the function are hyperplanes. For

f (x)

to have a maximum value, it is necessary and sufficient that a and b are opposites and neither is zero. In this case, we can express

z_{i 1}

and

z_{i 2}

as convex combinations).

z_{i 1} = - [θ_{1} (1 + λ_{1}) + (1 - θ_{1}) (\frac{1 - α + α^{2}}{1 - α} + λ_{1})] y

and

z_{i 2} = - [θ_{2} λ_{1} + (1 - θ_{2}) (\frac{α}{1 - α} + λ_{1})] y .

In addition, when

- {(z_{i 1} + (1 + λ_{1}) y)}^{⊤} ξ + z_{i 1}^{⊤} {\hat{ξ}}_{i} = - {(z_{i 1} + (\frac{1 - α + α^{2}}{1 - α} + λ_{1}) y)}^{⊤} ξ + \frac{α^{2}}{1 - α} ζ + z_{i 1}^{⊤} {\hat{ξ}}_{i},

i.e., when

ξ = d \frac{y}{{∥ y ∥}_{2}^{2}}

, the maximum value of

q_{1} (ξ)

is

\frac{ζ α^{2} (1 - θ_{1})}{1 - α} - [θ_{1} (1 + λ_{1}) + (1 - θ_{1}) (\frac{1 - α + α^{2}}{1 - α} + λ_{1})] y^{⊤} {\hat{ξ}}_{i} .

Similarly, the maximum value of

q_{2} (ξ)

is

\frac{d α (1 - θ_{2})}{1 - α} - ζ - [θ_{2} λ_{1} + (1 - θ_{2}) (\frac{α}{1 - α} + λ_{1})] y^{⊤} {\hat{ξ}}_{i} .

Moreover, since

{∥ y ∥}_{1} = 1

, problem (10) is equivalent to

\{\begin{matrix} min_{λ_{1}, λ_{2}, θ_{1}, θ_{2}, s} λ_{1} ρ + λ_{2} δ + \sum_{i = 1}^{N} s_{i} + \frac{1}{N} \sum_{i = 1}^{N} t_{i} \\ s . t . N s_{i} \geq \frac{ζ α^{2} (1 - θ_{1})}{1 - α} \\ - [θ_{1} (1 + λ_{1}) + (1 - θ_{1}) (\frac{1 - α + α^{2}}{1 - α} + λ_{1})] y^{⊤} {\hat{ξ}}_{i} - t_{i}, i = 1, \dots, N, \\ N s_{i} \geq \frac{d α (1 - θ_{2})}{1 - α} \\ - ζ - [θ_{2} λ_{1} + (1 - θ_{2}) (\frac{α}{1 - α} + λ_{1})] y^{⊤} {\hat{ξ}}_{i} - t_{i}, i = 1, \dots, N, \\ | (1 - α) λ_{1} + θ_{1} α^{2} + α^{2} + α + 1 | \leq λ_{2} (1 - α), \\ | λ_{1} + α (1 - λ_{1} - θ_{2}) | \leq λ_{2} (1 - α) \\ λ_{2} \geq 0, \\ θ_{1}, θ_{2} \in (0, 1) . \end{matrix}

= \{\begin{matrix} min_{λ_{1}, λ_{2}, θ_{1}, θ_{2}} & λ_{1} ρ + λ_{2} δ + \frac{1}{N} \sum_{i = 1}^{N} max (\frac{ζ α^{2} (1 - θ_{1})}{1 - α} - [θ_{1} + \frac{1 - α + α^{2}}{1 - α} (1 - θ_{1})] y^{⊤} {\hat{ξ}}_{i}, \\ \frac{d α (1 - θ_{2})}{1 - α} - ζ - \frac{α}{1 - α} (1 - θ_{2}) y^{⊤} {\hat{ξ}}_{i}) \\ | λ_{1} + α (1 - λ_{1} - θ_{2}) | \leq λ_{2} (1 - α) \\ λ_{2} \geq 0, \\ θ_{1}, θ_{2} \in (0, 1) . \end{matrix}

□

However, this distributionally robust optimization problem plays the role of a constraint in portfolio optimization, so we would prefer to obtain an analytical solution. Next, we consider approximating the Lagrangian dual formulation of problem (7) using the max–min inequality, i.e.,

sup_{μ \in W} inf_{λ \in R} f (μ, ζ) + λ [ρ - E_{μ} (y^{T} ξ)] \leq inf_{λ \in R} sup_{μ \in W} f (μ, ζ) + λ [ρ - E_{μ} (y^{T} ξ)] .

Using inequality (4) and Theorem 2, we find that the problem

inf_{λ \in R} sup_{μ \in W} f (μ, ζ) + λ [ρ - E_{μ} (y^{T} ξ)]

(11)

has an analytical solution.

Theorem 6.

We stipulate that the norm

∥ \cdot ∥

in the Wasserstein distance is the Euclidean norm

{∥ \cdot ∥}_{2}

. Assuming

E_{P_{N}} (y^{T} ξ) > 0

, we set

ρ = k E_{P_{N}} (y^{T} ξ)

with

k \in (0, 1)

. If

ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} < 0

, the solution to problem (11) is negative infinity. If

ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} \geq 0

, the solution to problem (11) is given by

\{\begin{matrix} f (P_{N}, ζ) + \frac{1}{2} [{δ ∥ y ∥}_{2} - ρ + E_{P_{N}} (y^{T} ξ)] & if α \leq \frac{1}{2}, \\ f (P_{N}, ζ) + \frac{α}{2 - 2 α} [{δ ∥ y ∥}_{2} - ρ + E_{P_{N}} (y^{T} ξ)] & if α > \frac{1}{2} . \end{matrix}

Proof.

Under the premise of Conclusion 4 and applying Theorem 2, we have

\begin{matrix} inf_{λ \in R} sup_{μ \in W} f (μ, ζ) + λ [ρ - E_{μ} (y^{T} ξ)] \\ = inf_{λ \in R} sup_{μ \in W} E_{μ} [{(y^{T} ξ - ζ)}_{+} - (λ + 1) y^{T} ξ - \frac{α}{1 - α} {(y^{T} ξ - d)}_{+}] + λ ρ \\ = inf_{λ \in R} E_{P_{N}} [{(y^{T} ξ - ζ)}_{+} - (λ + 1) y^{T} ξ - \frac{α}{1 - α} {(y^{T} ξ - d)}_{+}] + L δ ∥ y ∥ + λ ρ \\ = inf_{λ \in R} f (P_{N}, ζ) + L δ {∥ y ∥}_{2} + λ [ρ - E_{P_{N}} (y^{T} ξ)], \end{matrix}

where L is the Lipschitz constant of the function

t (x) = {(x - ζ)}_{+} - (λ + 1) x - \frac{α}{1 - α} {(x - d)}_{+}

, that is,

max (|λ + 1|, |λ|, |λ + \frac{α}{1 - α}|)

.

Let

A = {δ ∥ y ∥}_{2} > 0

and

B = ρ - E_{P_{N}} (y^{T} ξ) < 0

, and the next step is to solve the convex problem

inf_{λ \in R} A max (| λ + 1 |, | λ |, | λ + \frac{α}{1 - α} |) + B λ .

When

α \leq \frac{1}{2}

,

0 < \frac{α}{1 - α} \leq 1

, through analysis or graphing, we know that

max (|λ + 1|, |λ|, |λ + \frac{α}{1 - α}|) = \frac{1}{2} + |λ + \frac{1}{2}|

. When

λ \geq - \frac{1}{2}

, the problem becomes

inf_{λ \geq - \frac{1}{2}} (A + B) λ + A .

Obviously, under the condition

A + B \geq 0

, the minimum value is

\frac{A - B}{2}

; otherwise, it tends to negative infinity.

When

λ \leq - \frac{1}{2}

, the problem becomes

inf_{λ \leq - \frac{1}{2}} (B - A) λ .

Since

B - A < 0

holds, the minimum value of this problem is also

\frac{A - B}{2}

.

Under the condition

α \leq \frac{1}{2}

, we obtain the following result:

sup_{μ \in W} inf_{λ \in R} f (μ, ζ) + λ [ρ - E_{μ} (y^{T} ξ)] = \{\begin{matrix} f (P_{N}, ζ) + \frac{1}{2} [{δ ∥ y ∥}_{2} - ρ + E_{P_{N}} (y^{T} ξ)], \\ s . t . ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} \geq 0 . \end{matrix}

When

α > \frac{1}{2}

,

\frac{α}{1 - α} > 1

,

max (| λ + 1 |, | λ |, | λ + \frac{α}{1 - α} |) = \frac{α}{2 - 2 α} + | λ + \frac{α}{2 - 2 α} |

. Similarly, we have

sup_{μ \in W} inf_{λ \in R} f (μ, ζ) + λ [ρ - E_{μ} (y^{T} ξ)] = \{\begin{matrix} f (P_{N}, ζ) + \frac{α}{2 - 2 α} [{δ ∥ y ∥}_{2} - ρ + E_{P_{N}} (y^{T} ξ)], \\ s . t . ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} \geq 0 . \end{matrix}

□

Thus, in taking

α \leq \frac{1}{2}

as an example, if the optimal solution to problem (7), i.e., the worst-case distribution, is

P^{*}

, then the ES-expectile question processed through distributionally robust optimization (DRES-expectile) is

\begin{matrix} \inf_{d \in R, y \in R^{n}} & d \\ s . t . & d \geq {VaR}_{β}^{P^{*}} (y^{T} ξ), \\ \frac{α}{1 - α} E_{P_{N}} [{(y^{T} ξ - d)}_{+}] = inf_{ζ \in R} (1 - β) d + β ζ + E_{P_{N}} [{(y^{T} ξ - ζ)}_{+}] \\ - \frac{1}{2} [E_{P_{N}} (y^{T} ξ) + ρ] + \frac{1}{2} δ {∥ y ∥}_{2}, \\ y_{i} \geq 0, i = 1, \dots, n, \\ \sum_{i = 1}^{n} y_{i} = 1, \\ ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} \geq 0 . \end{matrix}

(12)

4. Question Simplification

In this section, to make the problem easier to solve, we consider using some approximation techniques to simplify the problem.

For the infimum part in the equation of question (12), although we know that the optimal

ζ

is the

1 - β

quantile of the random variable

y^{T} ξ

, due to the complexity of sorting, non-smoothness, and the involvement of the decision variable y, we apply a relaxation instead of directly choosing the optimal value. Additionally, we introduce new variables to smooth the function

{(x)}_{+}

.

Specifically, we first degrade the second constraint and introduce the variable

ζ

, transforming the problem into

\begin{matrix} \inf_{d, ζ \in R, y \in R^{n}} & d \\ s . t . & d \geq {VaR}_{β}^{P^{*}} (y^{T} ξ), \\ \frac{α}{1 - α} E_{P_{N}} [{(y^{T} ξ - d)}_{+}] \leq (1 - β) d + β ζ + E_{P_{N}} [{(y^{T} ξ - ζ)}_{+}] \\ - \frac{1}{2} [E_{P_{N}} (y^{T} ξ) + ρ] + \frac{1}{2} δ {∥ y ∥}_{2}, \\ y_{i} \geq 0, i = 1, \dots, n, \\ \sum_{i = 1}^{n} y_{i} = 1, \\ ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} \geq 0 . \end{matrix}

Next, we introduce new variables

a_{i}

, where

a_{i} = {(y^{T} ξ - d)}_{+}

, and generate two constraints:

a_{i} \geq \hat{ξ_{i}} - d

and

a_{i} \geq 0

. The same treatment is applied to

{(y^{T} ξ - ζ)}_{+}

, and the problem becomes

\begin{matrix} \inf_{d, ζ \in R, y \in R^{n}, a, b \in R^{N}} & d \\ s . t . & d \geq {VaR}_{β}^{P^{*}} (y^{T} ξ), \\ \frac{α}{N (1 - α)} e^{T} a \leq (1 - β) d + β ζ + \frac{1}{N} e^{T} b - \frac{1}{2} [E_{P_{N}} (y^{T} ξ) + ρ] + \frac{1}{2} δ {∥ y ∥}_{2}, \\ ρ - E_{P_{N}} (y^{T} ξ) + δ {∥ y ∥}_{2} \geq 0, \\ y_{i} \geq 0, i = 1, \dots, n, \\ \sum_{i = 1}^{n} y_{i} = 1, \\ a_{i}, b_{i} \geq 0, i = 1, \dots, N, \\ a_{i} \geq \hat{ξ_{i}} - d, i = 1, \dots, N, \\ b_{i} \geq \hat{ξ_{i}} - ζ, i = 1, \dots, N . \end{matrix}

(13)

For the first constraint

d \geq {VaR}_{β}^{P^{*}} (y^{T} ξ)

,

{VaR}_{β}^{P^{*}} (y^{T} ξ)

represents the

β

-quantile of the random variable

y^{T} ξ

under

P^{*}

. We approximate

P^{*}

using the empirical distribution and estimate the

β

-quantile using the normal distribution

{VaR}_{β}^{P^{*}} (y^{T} ξ) \approx y^{T} \hat{μ} + z_{β} \sqrt{y^{T} Σ_{\hat{ξ}} y},

where

\hat{μ}

is the sample mean vector,

Σ_{\hat{ξ}}

is the sample covariance matrix, and

z_{β}

is the

β

-quantile of the standard normal distribution.

In addition to the constraints and numerical boundaries on the portfolio weights y, we treat all other constraints as soft constraints, handled through a penalty function approach. On the one hand, this simplifies the problem, making it easier for the algorithm to find an optimal solution; on the other hand, it prevents overemphasis on risk d at the expense of ignoring the importance of investment returns. The question after applying the penalty function essentially becomes a weighted problem concerning the expectation, standard deviation, Value at Risk, and expected shortfall. To ensure smoothness and similarity in certain terms, the penalty function is set as

\log (1 + σ e^{r (x)})

, where

r (x)

represents constraints less than or equal to zero, and

σ

is the penalty factor.

inf_{d, ζ \in R, y \in C, a, b \in D} d + \sum_{j = 1}^{3} \log (1 + σ_{j} e^{r_{j} (d, ζ, y)}) + \sum_{i = 1}^{N} [\log (1 + σ_{a} e^{r_{a i} (d, ζ, y)}) + \log (1 + σ_{b} e^{r_{b i} (d, ζ, y)})],

(14)

where

\begin{matrix} r_{1} (d, ζ, y) & = y^{T} \hat{μ} + z_{β} \sqrt{y^{T} Σ_{\hat{ξ}} y} - d, \\ r_{2} (d, ζ, y) & = \frac{α}{N (1 - α)} e^{T} a + \frac{1}{2} (E_{P_{N}} (y^{T} ξ) + ρ) - (1 - β) d - β ζ - \frac{1}{N} e^{T} b - \frac{1}{2} δ {∥ y ∥}_{2}, \\ r_{3} (d, ζ, y) & = E_{P_{N}} (y^{T} ξ) - ρ - δ {∥ y ∥}_{2}, \\ r_{a i} (d, ζ, y) & = {\hat{ξ}}_{i} - d - a_{i}, i = 1, \dots, N, \\ r_{b i} (d, ζ, y) & = {\hat{ξ}}_{i} - ζ - b_{i}, i = 1, \dots, N, \end{matrix}

and the constraint sets are

D = {x \in R^{N} ∣ x_{i} \geq 0}, C = {y \in R^{n} ∣ y_{i} \geq 0, \sum_{i = 1}^{n} y_{i} = 1} .

5. Numerical Experiments

In this section, we first introduce the portfolio problems for comparison and the metrics that characterize portfolio performance. Next, we gather data from online sources, calculate returns, and utilize the solutions obtained from the corresponding questions to make investment decisions. Finally, we generate graphs illustrating asset changes and compute various financial statistics, such as the expected return, standard deviation, Sharpe ratio, expected shortfall, and Sortino ratio, for comparative analysis.

During the testing phase, we test the traditional ES-expectile question under the empirical distribution. Since the question sets the value of the expected return

ρ

as

k E_{P_{N}} (y^{T} ξ)

, the constraint becomes

(1 - k) E_{P_{N}} (y^{T} ξ)

, which turns into an ineffective constraint. Therefore, we modify this constraint to

E_{P_{N}} (y^{T} ξ) \geq 0

. Similarly to the approximation process from problems (12), (13), and finally (14), we first approximate problem (3) as follows:

\begin{matrix} \inf_{d, ζ \in R, y \in R^{n}, a, b \in R^{N}} & d \\ s . t . & d \geq {VaR}_{β}^{P^{*}} (y^{T} ξ), \\ \frac{α}{N (1 - α)} e^{T} a \leq (1 - β) d + β ζ + \frac{1}{N} e^{T} b - E_{P_{N}} (y^{T} ξ), \\ y_{i} \geq 0, i = 1, \dots, n, \\ \sum_{i = 1}^{n} y_{i} = 1, \\ a_{i}, b_{i} \geq 0, i = 1, \dots, N, \\ a_{i} \geq \hat{ξ_{i}} - d, i = 1, \dots, N, \\ b_{i} \geq \hat{ξ_{i}} - ζ, i = 1, \dots, N . \end{matrix}

Then, we apply a penalty function approach to transform the problem into

inf_{d, ζ \in R, y \in C, a, b \in D} d + \sum_{j = 1}^{3} \log (1 + σ_{j} e^{r_{j} (d, ζ, y)}) + \sum_{i = 1}^{N} [\log (1 + σ_{a} e^{r_{a i} (d, ζ, y)}) + \log (1 + σ_{b} e^{r_{b i} (d, ζ, y)})],

(15)

where

\begin{matrix} r_{1} (d, ζ, y) & = y^{T} \hat{μ} + z_{β} \sqrt{y^{T} Σ_{\hat{ξ}} y} - d, \\ r_{2} (d, ζ, y) & = \frac{α}{N (1 - α)} e^{T} a + E_{P_{N}} (y^{T} ξ) - (1 - β) d - β ζ - \frac{1}{N} e^{T} b, \\ r_{3} (d, ζ, y) & = - E_{P_{N}} (y^{T} ξ), \\ r_{a i} (d, ζ, y) & = {\hat{ξ}}_{i} - d - a_{i}, i = 1, \dots, N, \\ r_{b i} (d, ζ, y) & = {\hat{ξ}}_{i} - ζ - b_{i}, i = 1, \dots, N, \end{matrix}

and the constraint sets are

D = {x \in R^{N} ∣ x_{i} \geq 0}, C = {y \in R^{n} ∣ y_{i} \geq 0, \sum_{i = 1}^{n} y_{i} = 1} .

Compared to question (14), the main changes are in

r_{2} (d, ζ, y)

and

r_{3} (d, ζ, y)

.

Nowadays, many portfolio questions with good performance have emerged in academia. In this data experiment, we selected two traditional portfolio questions and their distributionally robust optimized versions. One of them is the mean–variance question proposed by Markowitz in 1952 [1]. However, since we use the empirical distribution in this experiment, the treatment of expected returns is consistent with that in the ES-expectile question. Therefore, we replace it with a weighted sum of the mean and standard deviation:

\begin{matrix} \inf_{y \in R^{n}} - E_{P_{N}} (y^{T} ξ) + ω \sqrt{y^{T} Σ_{ξ} y} \\ s . t . y_{i} \geq 0, i = 1, 2, \dots, n, \\ e^{T} y = 1 . \end{matrix}

(16)

Here,

ω > 0

represents the degree of risk aversion.

Jose Blanchet studied its distributionally robust optimization version in 2018 [13]. When the Wasserstein distance is based on the Euclidean norm and the radius of the Wasserstein ball is

δ

,

\begin{matrix} \inf_{y \in R^{n}} - \sqrt{δ} ∥ y ∥ + \sqrt{y^{T} Σ_{ξ} y} \\ s . t . y_{i} \geq 0, i = 1, 2, \dots, n, \\ e^{T} y = 1, \\ E_{P_{N}} (y^{T} ξ) \geq α + \sqrt{δ} ∥ y ∥ . \end{matrix}

(17)

Here, we set

α = k E_{P_{N}} (y^{T} ξ)

.

Secondly, we select the mean–expected shortfall question [9]:

\begin{matrix} \inf_{y \in R^{n}} - E_{P_{N}} (y^{T} ξ) + ω {ES}_{β} (y^{T} ξ) \\ s . t . y_{i} \geq 0, i = 1, 2, \dots, n, \\ e^{T} y = 1 . \end{matrix}

(18)

As for its distributionally robust optimized version, it is mentioned in ([9] Proposition 7.2) that under certain conditions, the optimal solution is the equally weighted portfolio.

We select a sample of 130 well-performing Chinese equities from cn.investing.com covering the period 2022–2023. Using their daily closing prices

P_{t}

(with t indexing trading days), we compute simple daily returns:

R_{t} = \frac{P_{t}}{P_{t - 1}} - 1,

where

P_{t}

denotes the closing price on day t, and

R_{t}

is the corresponding daily return. After discarding any trading day on which one or more equities have missing data, we retain a cleaned dataset with 460 valid trading days. These are days for which complete return data are available across all selected stocks.

Across these 460 days and 130 equities (roughly 60,000 return observations), the cross-sectional mean of individual daily returns is

0.061 %

, while the mean cross-sectional standard deviation is

2.56 %

. The pooled distribution of daily returns is mildly right-skewed (skewness,

0.39

) and strongly leptokurtic (excess kurtosis,

12.42

), signaling pronounced fat tails. Single-day extremes reach

+ 36.76 %

and

- 43.08 %

, and about 62% of the equities exhibit excess kurtosis above 3.

To evaluate the portfolio strategy’s viability, we conducted a rolling-window back-testing experiment. A 100-day estimation window was paired with four holding periods

h \in {3, 7, 14, 21}

. After each h-day decision interval, the estimation window advances by h days; the portfolio question is re-estimated on the updated 100-day sample and a new portfolio is formed. This cycle is repeated until the entire 360-day out-of-sample testing horizon is covered.

The initial cumulative return was set to

C R_{0} = 1

. If the portfolio realizes an interval return

R_{t, t + h}

over the holding period

[t, t + h]

, the cumulative return updates multiplicatively as

C R_{t + h} = C R_{t} (1 + R_{t, t + h}) .

Unfolding the recursion yields the cumulative return at any test horizon T (

1 \leq T \leq 360

):

C R_{T} = \prod_{t = 1}^{T} (1 + R_{t}),

where

R_{t}

is the one-day portfolio return on day t.

For the 360-day returns, we calculate the expectation, standard deviation, Sharpe ratio, expected shortfall, and Sortino ratio [20]. The expectation, standard deviation, and expected shortfall will not be discussed again. Here, the expected shortfall is calculated as

\frac{1}{0.05} \int_{0}^{0.05} {VaR}_{θ} (X) d θ .

The Sharpe ratio is defined as

\frac{R_{p} - R_{f}}{σ_{p}},

where

R_{p}

is the expected return of the portfolio,

R_{f}

is the risk-free rate, and

σ_{p}

is the standard deviation of the portfolio returns. The Sharpe ratio measures the excess return per unit of risk. For the risk-free rate, we assume the 4% annual interest rate on U.S. Treasury bonds as the risk-free rate. Since our returns are daily, we use

R_{f} = {(1 + 4 %)}^{\frac{1}{252}} - 1

, where 252 is the number of trading days in a year.

The Sortino ratio is defined as

\frac{R_{p} - R_{f}}{σ_{d}},

where

R_{p}

is the expected return of the portfolio,

R_{f}

is the risk-free rate (which can be considered the target return), and

σ_{d}

is the standard deviation of the downside returns, which are the returns below the risk-free rate. Unlike the Sharpe ratio, which considers both upside and downside volatility, the Sortino ratio focuses exclusively on downside risk. This means it only accounts for volatility below the target return and ignores the volatility above it.

The advantage of the Sortino ratio is that it provides a more precise risk-adjusted performance measure, particularly suited for investors who are more concerned with avoiding losses than simply achieving higher returns. A higher Sortino ratio indicates that the portfolio is generating better returns relative to the downside risk, which is a key measure of its effectiveness in mitigating losses.

We use the ‘minimize’ method from Python’s ‘scipy.optimize’ library to solve this problem. It is a general-purpose optimizer for minimizing scalar objective functions and supports various algorithms, including both unconstrained and constrained optimization. Users can solve nonlinear problems by passing different optimization algorithm options, such as SLSQP, BFGS, Nelder-Mead, etc. In this numerical experiment, all questions were solved using Sequential Least Squares Programming (SLSQP) [21]. All experiments were run on a laptop with an Intel Core i5-12500H CPU (12 threads at 2.5 GHz) and 16 GB of RAM, under Python 3.11 with NumPy 1.26 and SciPy 1.11.4, without any GPU acceleration. Solving a single DRES-expectile portfolio problem took 7.31 ± 0.32 s of wall-clock time and required 37 ± 2 iterations for the optimizer to reach its convergence criteria.

Here, we provide explanations for the abbreviations of the tested questions. DRES-e refers to DRES-expectile (14), ES-e refers to ES-expectile (15), 1/N refers to the equally weighted portfolio, Mean-Std refers to the mean–standard deviation question (16), DRMean-Std refers to the distributionally robust optimized mean–standard deviation question (17), and Mean-ES refers to the mean–expected shortfall question (18).

It can be observed from Table 2 that regardless of the sliding window employed, the average return of DRES-expectile is significantly higher than that of other questions. It also demonstrates excellent performance in terms of the Sharpe ratio and the Sortino ratio. However, it performs inadequately in terms of standard deviation and expected shortfall. When combined with the cumulative return curve in Figure 1, Figure 2, Figure 3 and Figure 4, it can be seen that DRES-expectile exhibits an explosive upward trend, and this is inevitably accompanied by a poor standard deviation and expected shortfall. Regarding the ES-expectile question under the empirical distribution, its performance is nearly equivalent to that of the mean–variance question. We also discover that questions present diverse performance under different sliding windows. The reason behind this lies in that the question mainly captures a short-term future trend. Hence, if an investment portfolio is utilized for an overly short period, its superiority in the average sense cannot be manifested. Similarly, if an investment portfolio is used for an excessively long period, the long-term stability of returns becomes difficult to guarantee.

Furthermore, to explore the reason behind the high average returns of our question, we fix the sliding window to 7, keeping other parameters unchanged. We record the solution vectors for each question at each stage (52 in total, with each solution vector having 130 dimensions, corresponding to 130 stocks), i.e., the portfolio weights. We then calculate the following statistics: the average number of components in the solution vector greater than 0.05, 0.1, and 0.2 and average values of the maximum, minimum, median, 25th percentile, and 5th percentile components of the solution vectors.

As shown in Table 3,

G_{0.05}, G_{0.1},

and

G_{0.2}

represent the average number of components exceeding 0.05, 0.1, and 0.2, respectively.

G_{\max}

and

G_{mid}

represent the average of the maximum value and the median value of the components. As for the 25th percentile, 5th percentile, and minimum value of the components, all are smaller than

10^{- 19}

and are approximately equal to zero within the arithmetic precision; thus, they are not presented or analyzed here.

Based on the five concentration indicators,

G_{0.05}, G_{0.1}, G_{0.2}, G_{\max}

and

G_{mid}

, a clear gradient of capital allocation emerges: DRES-expectile attains

G_{0.05} = G_{0.1} = G_{0.2} = 1

,

G_{\max} = 1

, and

G_{mid} \approx 0

, meaning that in almost every rolling window the entire budget is staked on a single stock and the remaining 129 weights are truncated to machine precision. This “all-in one bet” design amplifies stock selection skill into explosive portfolio returns whenever the portfolio question correctly pinpoints the short-term winner.

ES-expectile remains highly focused but is less extreme: On average, roughly six stocks exceed the

5 %

threshold, none exceed

20 %

, and the largest position is capped at

11 %

; together with

G_{mid} \approx 0

this delivers a milder risk–return trade-off than DRES-expectile.

Mean-ES lies in between:

G_{0.05} = 1.45

and

G_{0.2} = 1.26

show that one to two stocks carry more than

20 %

each, the largest weight is limited to

86 %

, and

G_{mid} \approx 1.5 \times 10^{- 4}

confirms that the long tail is still heavily pruned; the portfolio question therefore preserves high concentration while curbing the tail risk observed with DRES-expectile.

Mean-Std and ES-expectile (under the empirical distribution) display a “barbell” pattern with moderate concentration (

G_{0.05} \approx 5.6

,

G_{0.1} \approx 0.8

) and no exposure above

20 %

, whereas DRMean-Std (G thresholds all zero,

G_{\max} = 0.02

,

G_{mid} = 0.007

) spreads the weight almost uniformly across the universe.

Since it is neither realistic nor prudent to invest all the capital in a single stock in typical investment scenarios, we slightly adjust the solution of DRES-expectile by lowering the maximum investment weights to 0.5, 0.3, 0.1, and 0.05, while evenly distributing the remaining weights among the other stocks for data experiments. Additionally, we anticipate that some may question whether we are truly maximizing the expected return, so we also conducted tests to address this. In the following figures and table, we provide a brief explanation of the abbreviations used: DRES-e(50), DRES-e(30), DRES-e(10), and DRES-e(5) represent the results where the maximum investment weights are set to 0.5, 0.3, 0.1, and 0.05, respectively, with the remaining weight evenly distributed. “Mean” represents the solution to the following optimization problem

\begin{matrix} sup_{y \in R^{n}} & \frac{1}{N} \sum_{i = 1}^{N} y^{⊤} {\hat{ξ}}_{i} \\ s . t . & y_{i} \geq 0, i = 1, \dots, n, \\ e^{T} y = 1 . \end{matrix}

Table 4 shows that imposing progressively tighter caps on the maximum single-asset weight (from 50% down to 5%) brings a sharp reduction in portfolio variance and expected shortfall, while the corresponding fall in mean return is comparatively small. This characteristic can also be directly observed from Figure 5, Figure 6, Figure 7 and Figure 8. Because volatility declines more rapidly than the mean, the Sharpe and Sortino ratios actually rise, particularly in the 14-day window, which indicates that tail risk in the unconstrained solution is driven chiefly by extreme concentration.

The mean-only benchmark produces weaker risk-adjusted results across every horizon and even shows negative average returns in the 14-day window; therefore, our optimization problem cannot be regarded as a simple return maximization exercise. Over the longest window (21 days) the advantage of DRES-expectile narrows, suggesting that its edge is strongest when the portfolio question is recalibrated frequently. We therefore recommend employing the capped DRES-expectile portfolio as a tactical sleeve on short rolling windows, while pairing it with more conservative strategies to stabilize long-term risk.

6. Discussion

In this section, we first examine the practical benefits of measuring portfolio risk using an expectile-based framework augmented with DRES-expectile, from both managerial and investor viewpoints. We then translate our empirical findings into concrete managerial implications for the design of future portfolio optimization strategies, showing how the DRES-expectile portfolio question can be deployed tactically, integrated into multi-strategy architectures, and governed under real-world risk budgets.

6.1. Risk Measurement Implications for Managers and Investors

The numerical experiments presented earlier clearly demonstrate the practical characteristics of the DRES-expectile portfolio question: superior average return and Sharpe ratio across all sliding windows, but accompanied by higher volatility and concentration risk. These results lead to two distinct sets of implications, one for portfolio managers and risk controllers, and one for investors.

6.1.1. Managerial Perspective

Tail-aware and continuously adjustable risk control. In contrast to “threshold-triggered” jump reactions typical of Value-at-Risk portfolio questions, an expectile-based risk metric offers earlier and smoother tail risk warnings. Thanks to this continuity, managers can employ gradient-based optimization to fine-tune risk in real time, greatly improving the controllability and sensitivity of portfolio adjustments.

Practical governance levers. The primary issues—elevated expected shortfall and high volatility—arise from oversized single-stock exposure. A governance committee can remedy this simply by imposing position-weight caps, a standard control tool. Once these caps are in place, the portfolio’s Sharpe ratio still surpasses that of all classical portfolio questions over 3 to 14 day windows, demonstrating that the framework aligns seamlessly with existing risk management policies.

6.1.2. Investor Perspective

Transparent risk–return trade-off. The ES-expectile metric produces a continuous, tail-sensitive spectrum of risk values. This enables investors to see not just how often losses might occur but also how severely, and how this evolves as portfolio positions are adjusted. Unlike threshold-based measures such as VaR, this “risk spectrum” facilitates better understanding and more informed investment decisions.

Downside protection with controlled sacrifice. When upper bounds are applied (DRES-e(30/10/5)), the portfolio question significantly reduces ES while preserving high Sharpe ratios. Investors obtain a more balanced growth path by accepting marginally lower peaks in exchange for markedly lower drawdowns, which is especially desirable for institutions with capital preservation mandates.

6.2. Managerial Implications for Future Portfolio Optimization Strategies

6.2.1. Strategic Application

Across every window length, the DRES-expectile portfolio question shows a strong, steadily rising performance trajectory, whereas traditional portfolio questions produce much flatter curves. Investors thus observe a continuous risk-evolution path and can detect the build-up of potential tail risk sooner, avoiding the VaR-style “risk-silent interval.” This clearer view of the risk–return trade-off empowers investors to make more forward-looking allocation decisions.

For more risk-averse investors, such as pension funds or insurance portfolios, the DRES-expectile portfolio question should be positioned as a satellite strategy—allocated only within clearly defined risk budgets. This approach enables controlled exposure to potential upside while maintaining the portfolio’s overall risk discipline.

6.2.2. Integration with Other Strategies

To further improve long-term portfolio stability, we recommend integrating the DRES-expectile portfolio question into a broader multi-strategy architecture:

Core satellite structure: Use low-volatility portfolio questions such as mean–variance or minimum variance for the core portfolio, and deploy DRES-expectile as a growth-seeking satellite component.
Dynamic window adjustment: Vary the sliding window length adaptively based on market volatility or regime shifts, ensuring the portfolio question does not overfit to a fixed horizon.
Risk budgeting and cap enforcement: Assign explicit volatility or VaR limits to the DRES-expectile sleeve, and coordinate with other sub-strategies to manage overall portfolio risk. This allows total portfolio exposure to remain within pre-defined limits while leveraging the growth potential of the robust tail-aware portfolio question.

7. Conclusions

We first introduced the expectation gap to extend the traditional risk measure and expectile, and demonstrated its effectiveness by proving that it has a solution. To better control the instability caused by empirical distributions, we employed distributionally robust optimization, introduced necessary theorems, and made reasonable assumptions. We then updated the ES-expectile and simplified the problem using optimization techniques. In numerical experiments, our DRES-expectile method almost exactly captured the trend in stock growth, resulting in the selection of only one stock in each decision, which led to excellent returns, though with a higher standard deviation. We acknowledge that such results may seem unrealistic and lack persuasiveness in practical applications. Therefore, we further conducted data experiments to show that this portfolio problem is not simply an expectation maximization problem. Moreover, by adjusting the weight of extreme solutions, we proposed a more practical portfolio strategy. We believe that this portfolio problem has potential when combined with other conservative investment strategies and holds certain reference value and research significance.

Author Contributions

Conceptualization, H.W., Y.Z. and C.L.; methodology, H.W., Y.Z., C.L. and X.Z.; software, H.W. and Y.Z.; validation, H.W., Y.Z., Y.G. and C.L.; formal analysis, Y.G.; investigation, H.W. and Y.G.; resources, Y.Z.; data curation, Y.G.; writing—original draft preparation, H.W., Y.Z., Y.G. and X.Z.; writing—review and editing, H.W., Y.Z., C.L. and X.Z.; visualization, Y.Z.; supervision, H.W. and C.L.; project administration, H.W. and C.L.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Science and Technology Research Project of Henan Province of China OF FUNDER grant number 252102240125.

Data Availability Statement

The stock closing price data used in this study are publicly available at Investing.com (https://cn.investing.com), a widely used financial market platform that provides comprehensive financial data, real-time quotes, charts, and financial news on global stocks, indices, commodities, and currencies.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Markowitz, H. The utility of wealth. J. Political Econ. 1952, 60, 151–158. [Google Scholar] [CrossRef]
Chopra, V.K.; Ziemba, W.T. The effect of errors in means, variances, and covariances on optimal portfolio choice. J. Portfolio Manag. 1993, 19, 6–11. [Google Scholar] [CrossRef]
Cai, J.; Weng, C. Optimal reinsurance with expectile. Scand. Actuar. J. 2016, 2016, 624–645. [Google Scholar] [CrossRef]
Bellini, F.; Klar, B.; Müller, A.; Rosazza Gianin, E. Generalized quantiles as risk measures. Insur. Math. Econ. 2014, 54, 41–48. [Google Scholar] [CrossRef]
Zhang, F.; Xu, Y.; Fan, C. Non-parametric inference of expectile-based value-at-risk for financial time series with application to risk assessment. Int. Rev. Financ. Anal. 2023, 90, 102852. [Google Scholar] [CrossRef]
Taylor, J.W. Estimating value at risk and expected shortfall using expectiles. J. Financ. Econom. 2008, 6, 231–252. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Uryasev, S. Conditional value-at-risk for general loss distributions. J. Bank. Financ. 2002, 26, 1443–1471. [Google Scholar] [CrossRef]
Rahimian, H.; Mehrotra, S. Frameworks and results in distributionally robust optimization. Open J. Math. Optim. 2022, 3, 4. [Google Scholar] [CrossRef]
Mohajerin Esfahani, P.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
Gao, R.; Kleywegt, A.J. Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 2022, 48, 603–655. [Google Scholar] [CrossRef]
Gao, R.; Chen, X.; Kleywegt, A.J. Wasserstein distributionally robust optimization and variation regularization. Oper. Res. 2022, 72, 1177–1191. [Google Scholar] [CrossRef]
Shafieezadeh-Abadeh, S.; Kuhn, D.; Mohajerin Esfahani, P. Regularization via mass transportation. J. Mach. Learn. Res. 2019, 20, 1–68. [Google Scholar]
Blanchet, J.; Chen, L.; Zhou, X.Y. Distributionally robust mean-variance portfolio selection with Wasserstein distances. Manag. Sci. 2022, 68, 6382–6410. [Google Scholar] [CrossRef]
Chen, D.; Wu, Y.; Li, J.; Zhou, X. Distributionally robust mean–absolute deviation portfolio optimization using the Wasserstein metric. J. Glob. Optim. 2023, 87, 783–805. [Google Scholar] [CrossRef]
Liu, W.; Liu, Y. Worst-case higher-moment coherent risk based on optimal transport with application to distributionally robust portfolio optimization. Symmetry 2022, 14, 138. [Google Scholar] [CrossRef]
Wu, Z.; Sun, K. Distributionally robust optimization with Wasserstein metric for multi-period portfolio selection under uncertainty. Appl. Math. Model. 2023, 117, 513–528. [Google Scholar] [CrossRef]
Zhang, Z.; Jing, H.; Kao, C. High-dimensional distributionally robust mean-variance efficient portfolio selection. Mathematics 2023, 11, 1272. [Google Scholar] [CrossRef]
Fan, Z.; Ji, R.; Lejeune, M.A. Distributionally robust portfolio optimization under marginal and copula ambiguity. J. Optim. Theory Appl. 2024, 203, 2870–2907. [Google Scholar] [CrossRef]
Shapiro, A. On duality theory of conic linear problems. Nonconvex Optim. Appl. 2001, 57, 135–155. [Google Scholar] [CrossRef]
Chaudhry, A.; Johnson, H.L. The efficacy of the Sortino ratio and other benchmarked performance measures under skewed return distributions. Aust. J. Manag. 2008, 32, 485–502. [Google Scholar] [CrossRef]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]

Figure 1. Cumulative returns of all models with a sliding window of 3.

Figure 2. Cumulative returns of all models with a sliding window of 7.

Figure 3. Cumulative returns of all models with a sliding window of 14.

Figure 4. Cumulative returns of all models with a sliding window of 21.

Figure 5. Cumulative returns with a sliding window of 3 (improved DRES-expectile vs. mean).

Figure 6. Cumulative returns with a sliding window of 7 (improved DRES-expectile vs. mean).

Figure 7. Cumulative returns with a sliding window of 14 (improved DRES-expectile vs. mean).

Figure 8. Cumulative returns with a sliding window of 21 (improved DRES-expectile vs. mean).

Table 1. Comprehensive list of symbols and acronyms.

Symbol/Abbreviation	English Full Name
${VaR}_{θ} (X)$	Value at Risk at confidence level $θ$
${ES}_{β} (X)$	Expected shortfall at confidence level $β$
$q_{α} (X)$	Expectile at level $α$
ES-expectile	Expectile defined using asymmetric
	expected shortfall on both tails
$q_{α, β_{1}, β_{2}} (X)$	ES-expectile with parameters $α$ , $β_{1}$ , and $β_{2}$
DRO	Distributionally robust optimization
DRES-expectile	Distributionally robust ES-expectile
RU Formula	Rockafellar–Uryasev Formula (1)

Table 2. Performance metrics for different questions across sliding windows.

Window	Question	Mean	Standard Deviation	Sharpe Ratio	Expected Shortfall	Sortino Ratio
3 days	DRES-e	6.25 × 10⁻³	4.24 × 10⁻²	1.44 × 10⁻¹	−7.58 × 10⁻²	2.56 × 10⁻¹
	ES-e	6.92 × 10⁻⁴	4.06 × 10⁻³	1.32 × 10⁻¹	−7.99 × 10⁻³	2.37 × 10⁻¹
	1/N	1.04 × 10⁻³	7.21 × 10⁻³	1.23 × 10⁻¹	−1.33 × 10⁻²	1.27 × 10⁻¹
	Mean-Std	5.21 × 10⁻⁴	4.32 × 10⁻³	8.46 × 10⁻²	−9.57 × 10⁻³	1.33 × 10⁻¹
	DRMean-Std	1.01 × 10⁻³	7.24 × 10⁻³	1.18 × 10⁻¹	−1.37 × 10⁻²	1.97 × 10⁻¹
	Mean-ES	2.98 × 10⁻³	3.06 × 10⁻²	9.21 × 10⁻²	−5.82 × 10⁻²	1.58 × 10⁻¹
7 days	DRES-e	7.95 × 10⁻³	4.49 × 10⁻²	1.73 × 10⁻¹	−7.98 × 10⁻²	2.87 × 10⁻¹
	ES-e	6.90 × 10⁻⁴	4.18 × 10⁻³	1.28 × 10⁻¹	−8.37 × 10⁻³	2.09 × 10⁻¹
	1/N	1.03 × 10⁻³	7.24 × 10⁻³	1.21 × 10⁻¹	−1.33 × 10⁻²	2.13 × 10⁻¹
	Mean-Std	5.87 × 10⁻⁴	4.45 × 10⁻³	9.70 × 10⁻²	−9.77 × 10⁻³	1.46 × 10⁻¹
	DRMean-Std	1.01 × 10⁻³	7.25 × 10⁻³	1.18 × 10⁻¹	−1.35 × 10⁻²	2.03 × 10⁻¹
	Mean-ES	2.42 × 10⁻³	3.10 × 10⁻²	7.28 × 10⁻²	−6.21 × 10⁻²	1.19 × 10⁻¹
14 days	DRES-e	9.12 × 10⁻³	4.79 × 10⁻²	1.87 × 10⁻¹	−7.99 × 10⁻²	3.77 × 10⁻¹
	ES-e	7.56 × 10⁻⁴	4.19 × 10⁻³	1.43 × 10⁻¹	−8.39 × 10⁻³	2.25 × 10⁻¹
	1/N	1.04 × 10⁻³	7.29 × 10⁻³	1.21 × 10⁻¹	−1.33 × 10⁻²	2.14 × 10⁻¹
	Mean-Std	6.95 × 10⁻⁴	4.52 × 10⁻³	1.19 × 10⁻¹	−9.63 × 10⁻³	1.75 × 10⁻¹
	DRMean-Std	1.01 × 10⁻³	7.26 × 10⁻³	1.17 × 10⁻¹	−1.34 × 10⁻²	2.00 × 10⁻¹
	Mean-ES	−2.99 × 10⁻⁴	3.40 × 10⁻²	−1.34 × 10⁻²	−7.68 × 10⁻²	2.57 × 10⁻²
21 days	DRES-e	5.43 × 10⁻³	4.96 × 10⁻²	1.06 × 10⁻¹	−1.02 × 10⁻¹	1.55 × 10⁻¹
	ES-e	7.93 × 10⁻⁴	4.20 × 10⁻³	1.52 × 10⁻¹	−8.57 × 10⁻³	2.31 × 10⁻¹
	1/N	1.02 × 10⁻³	7.33 × 10⁻³	1.18 × 10⁻¹	−1.34 × 10⁻²	2.13 × 10⁻¹
	Mean-Std	7.15 × 10⁻⁴	4.53 × 10⁻³	1.23 × 10⁻¹	−9.79 × 10⁻³	1.80 × 10⁻¹
	DRMean-Std	9.90 × 10⁻⁴	7.27 × 10⁻³	1.15 × 10⁻¹	−1.36 × 10⁻²	2.01 × 10⁻¹
	Mean-ES	1.66 × 10⁻³	3.03 × 10⁻²	4.97 × 10⁻²	−6.60 × 10⁻²	7.26 × 10⁻²

Table 3. Statistical summary of the different questions’ solutions.

Question	$G_{0.05}$	$G_{0.1}$	$G_{0.2}$	$G_{\max}$	$G_{mid}$
DRES-e	1.00	1.00	1.00	1.00	$6.69 \times 10^{- 18}$
ES-e	5.72	0.81	0.00	0.11	$5.44 \times 10^{- 18}$
Mean-Std	5.60	0.70	0.00	0.11	$4.27 \times 10^{- 18}$
DRMean-Std	0.00	0.00	0.00	0.02	$7.11 \times 10^{- 3}$
Mean-ES	1.45	1.45	1.26	0.86	$1.45 \times 10^{- 4}$

Table 4. Performance metrics for the improved DRES-expectile solution and mean questions.

Window	Model	Mean	Variance	Sharpe Ratio	Expected Shortfall	Sortino Ratio
3 days	DRES-e(50)	3.24 × 10⁻³	2.23 × 10⁻²	1.38 × 10⁻¹	−3.88 × 10⁻²	2.68 × 10⁻¹
	DRES-e(30)	2.34 × 10⁻³	1.52 × 10⁻²	1.44 × 10⁻¹	−2.68 × 10⁻²	2.79 × 10⁻¹
	DRES-e(10)	1.44 × 10⁻³	9.07 × 10⁻³	1.41 × 10⁻¹	−1.59 × 10⁻²	2.68 × 10⁻¹
	DRES-e(5)	1.49 × 10⁻³	9.40 × 10⁻³	1.42 × 10⁻¹	−1.65 × 10⁻²	2.68 × 10⁻¹
	mu	2.81 × 10⁻³	3.13 × 10⁻²	8.50 × 10⁻²	−5.89 × 10⁻²	1.46 × 10⁻¹
7 days	DRES-e(50)	4.29 × 10⁻³	2.46 × 10⁻²	1.68 × 10⁻¹	−4.45 × 10⁻²	3.09 × 10⁻¹
	DRES-e(30)	2.96 × 10⁻³	1.66 × 10⁻²	1.70 × 10⁻¹	−2.98 × 10⁻²	3.19 × 10⁻¹
	DRES-e(10)	1.64 × 10⁻³	9.46 × 10⁻³	1.56 × 10⁻¹	−1.65 × 10⁻²	3.00 × 10⁻¹
	DRES-e(5)	1.68 × 10⁻³	9.79 × 10⁻³	1.56 × 10⁻¹	−1.70 × 10⁻²	2.99 × 10⁻¹
	mu	1.92 × 10⁻³	3.20 × 10⁻²	5.51 × 10⁻²	−6.47 × 10⁻²	8.75 × 10⁻²
14 days	DRES-e(50)	4.91 × 10⁻³	2.46 × 10⁻²	1.94 × 10⁻¹	−4.07 × 10⁻²	3.80 × 10⁻¹
	DRES-e(30)	3.34 × 10⁻³	1.66 × 10⁻²	1.92 × 10⁻¹	−2.79 × 10⁻²	3.77 × 10⁻¹
	DRES-e(10)	1.76 × 10⁻³	9.53 × 10⁻³	1.68 × 10⁻¹	−1.64 × 10⁻²	3.25 × 10⁻¹
	DRES-e(5)	1.81 × 10⁻³	9.86 × 10⁻³	1.67 × 10⁻¹	−1.70 × 10⁻²	3.24 × 10⁻¹
	mu	−5.00 × 10⁻⁴	3.45 × 10⁻²	−1.90 × 10⁻²	−7.82 × 10⁻²	−2.53 × 10⁻²
21 days	DRES-e(50)	3.21 × 10⁻³	2.66 × 10⁻²	1.15 × 10⁻¹	−5.37 × 10⁻²	1.74 × 10⁻¹
	DRES-e(30)	2.32 × 10⁻³	1.78 × 10⁻²	1.22 × 10⁻¹	−3.49 × 10⁻²	1.95 × 10⁻¹
	DRES-e(10)	1.43 × 10⁻³	9.79 × 10⁻³	1.30 × 10⁻¹	−1.77 × 10⁻²	2.38 × 10⁻¹
	DRES-e(5)	1.48 × 10⁻³	1.01 × 10⁻²	1.31 × 10⁻¹	−1.82 × 10⁻²	2.39 × 10⁻¹
	mu	1.57 × 10⁻³	3.02 × 10⁻²	4.68 × 10⁻²	−6.60 × 10⁻²	6.87 × 10⁻²

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Zhao, Y.; Guo, Y.; Liu, C.; Zhang, X. Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation. Symmetry 2025, 17, 959. https://doi.org/10.3390/sym17060959

AMA Style

Wang H, Zhao Y, Guo Y, Liu C, Zhang X. Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation. Symmetry. 2025; 17(6):959. https://doi.org/10.3390/sym17060959

Chicago/Turabian Style

Wang, Haonan, Yunxiao Zhao, Yixin Guo, Changhe Liu, and Xinlin Zhang. 2025. "Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation" Symmetry 17, no. 6: 959. https://doi.org/10.3390/sym17060959

APA Style

Wang, H., Zhao, Y., Guo, Y., Liu, C., & Zhang, X. (2025). Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation. Symmetry, 17(6), 959. https://doi.org/10.3390/sym17060959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Symmetric Adjustable Tail-Risk Measure for Distributionally Robust Optimization in Portfolio Allocation

Abstract

1. Introduction

2. ES-Expectile

3. DRO Theory and DRES-Expectile

4. Question Simplification

5. Numerical Experiments

6. Discussion

6.1. Risk Measurement Implications for Managers and Investors

6.1.1. Managerial Perspective

6.1.2. Investor Perspective

6.2. Managerial Implications for Future Portfolio Optimization Strategies

6.2.1. Strategic Application

6.2.2. Integration with Other Strategies

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI