Next Article in Journal
Experimental Analysis of a Fuzzy Scheme against a Robust Controller for a Proton Exchange Membrane Fuel Cell System
Previous Article in Journal
Group Theoretical Description of the Periodic System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Worst-Case Higher Moment Coherent Risk Based on Optimal Transport with Application to Distributionally Robust Portfolio Optimization

1
Research Center for Applied Mathematics and Interdisciplinary Sciences, Beijing Normal University at Zhuhai, Zhuhai 519087, China
2
School of Economics, Jiangxi University of Finance and Economics, Nanchang 330013, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(1), 138; https://doi.org/10.3390/sym14010138
Submission received: 21 December 2021 / Revised: 4 January 2022 / Accepted: 7 January 2022 / Published: 12 January 2022
(This article belongs to the Section Mathematics)

Abstract

:
The tail risk management is of great significance in the investment process. As an extension of the asymmetric tail risk measure—Conditional Value at Risk (CVaR), higher moment coherent risk (HMCR) is compatible with the higher moment information (skewness and kurtosis) of probability distribution of the asset returns as well as capturing distributional asymmetry. In order to overcome the difficulties arising from the asymmetry and ambiguity of the underlying distribution, we propose the Wasserstein distributionally robust mean-HMCR portfolio optimization model based on the kernel smoothing method and optimal transport, where the ambiguity set is defined as a Wasserstein “ball” around the empirical distribution in the weighted kernel density estimation (KDE) distribution function family. Leveraging Fenchel’s duality theory, we obtain the computationally tractable DCP (difference-of-convex programming) reformulations and show that the ambiguity version preserves the asymmetry of the HMCR measure. Primary empirical test results for portfolio selection demonstrate the efficiency of the proposed model.

1. Introduction

Portfolio selection is a typical optimization problem under parameter uncertainty when facing uncertain asset returns. Unlike a deterministic optimization problem, the objective function in a decision model facing uncertainty is subjective and contingent on the preference of the investor. Indeed, investment preference concerning uncertainty can be associated with risk and ambiguity (see [1]). Following the seminal work by [2], the difference between the concepts of risk and ambiguity (Knightian uncertainty) in decision theory is that the former exposure to uncertain outcomes whose probability distribution is known, and the latter exposure to uncertainty about the probability distribution of the outcomes. Following the seminal work by [3], the portfolio selection is a bi-criteria optimization problem trading-off between returns and risk, that is, maximizing the expected portfolio return while, at the same time, minimizing some risk measure of portfolio loss. The assumption of distributional symmetry, however, is limiting in portfolio optimization, in which distributions are often known to be asymmetric. Thus, the downside risk measure that reflects asymmetric risk preferences is more suitable than variance that considers both upside and downside risk symmetrically in [3]. In this paper we focus on the following asymmetric tail risk measure—higher-moment coherent risk (HMCR) measures proposed by [4] (see [5] for more details of risk measure).
Definition 1
(HMCR). For any given confidence level γ [ 0 , 1 ) and order p 1 , the HMCR measures of random loss X
HMCR γ p ( X ) : = inf α R α + ( 1 γ ) 1 [ X α ] + p ,
where [ x ] + : = max { x , 0 } and X p : = E | X | p 1 / p .
In particular, the popular Conditional Value-at-Risk (CVaR) corresponds to HMCR γ 1 (see [6]), and HMCR γ 2 is called the second-moment coherent risk (SMCR) measure. The following properties of the family of HMCR measures motivates us to choose it as the risk measure for portfolio selection: (i) In the sense of the coherent risk measure proposed by [7], HMCR measures are coherent and capture the distributional asymmetry. Furthermore, HMCR measures are compatible with second-order stochastic dominance (see [4]); (ii) As distinguished from the other known risk measures involving the information of the higher moments, for example, lower partial moment (see [8]) and the coherent measure of semi- L p type (see  [9]), the location of “tail cutoff” in the HMCR measures is determined by the optimal solution of the stochastic programming problem (1), meanwhile, is adjustable by the confidence level γ . (iii) The risk-aversion stochastic programming problem with HMCR measures can readily be handled by the well-developed methods of conic programming as the p-order cone within the positive orthant can be approximated by linear inequalities (see [4]).
Let the random asset returns ξ R n be defined on a probability space ( Ω , F , P ), where Ω is a compact sample space for ξ , F is a σ -algebra of Ω , and P is a probability measure on F . Let the portfolio decision be w W R n , where W represents a convex set of feasible decisions. For the given confidence level γ [ 0 , 1 ) and the investor’s risk-aversion coefficient ρ > 0 , we consider the mean-HMCR portfolio optimization model of the following form:
min w W , α R F ( w , α ) : = ρ α + E P w T ξ + ρ 1 γ E P [ w T ξ α ] + p 1 / p ,
where W = { w : w 0 , 1 T w = 1 } and 1 denotes the vector of ones and P is the probability distribution of ξ . However, in real world the true probability distribution of ξ is often unavailable, and is difficult to forecast precisely (see [10]), which results in poor out-of-sample performance when resorting to the sample average approximation (SAA) approach (see for example [11]). Thus, it makes sense to seek a robust portfolio immune to the uncertainty of distribution of the asset returns. Many powerful modeling paradigms have been proposed for optimization when facing uncertainty in the past decades (see for example [12,13,14]). The paradigm of distributionally robust optimization (DRO) considers an ambiguity set (a family of probability distributions with limited and known distributional information) and evaluates the performance of a decision-maker by its worst-case expected performance under any distribution residing in the ambiguity set, which may be traced to [15].
The key issue of DRO is how to construct an ambiguity set such that the bi-layer uncertain DRO model can be reformulated as a computationally tractable single-layer deterministic convex optimization problem. The popular methods of constructing an ambiguity set basically focus on the moment-based ambiguity set, which is constructed by certain moment information (see, for example, [16,17,18,19,20,21,22]), and the metric-based one, which is defined as a “ball” in the sense of a certain probability metric such as the Prohorov metric (see [23]), the goodness-of-fit (see [24]), likelihood function (see [25]), ϕ -divergence (see, for example,  [26,27,28,29,30]), Kullback–Leibler (KL) divergence (see, for example, [31,32]) and so forth. In particular, due to the outstanding properties of Wasserstein distance, defined as follows, Wasserstein distance is attracting a growing interest in DRO recently (see, for example, [33,34,35,36,37]).
Definition 2
(Wasserstein distance). For any r 1 , r-Wasserstein distance between two probability distributions Q and Q on R n is defined as:
W r ( Q , Q ) = inf π Π ( Q , Q ) R n × R n ξ ξ r π ( d ξ , d ξ ) 1 / r ,
where · is a norm on R n , while Π ( Q , Q ) denotes the set of all joint probability distributions of ξ R n and ξ R n with marginals Q and Q , respectively.
For example, ref. [33] study distributionally robust optimization problems with the Wasserstein ambiguity set centering at the empirical distribution and the objective of the inner maximization is an expectation of certain loss function, under certain mild assumptions, derive a finite convex programming problem. In this paper, we focus on the distributionally robust version of (2), that is, the following risk-aversion distributionally robust portfolio optimization model with the form:
min w W , α R max P P F ( w , α ; P ) : = ρ α + E P w T ξ + ρ 1 γ E P [ w T ξ α ] + p 1 / p ,
where the ambiguity set P is defined as a ball in the space of probability distribution by using Wasserstein distance, which is centered at the discrete empirical probability distribution. It is worth noting that, in contrast to the expected (risk-neutral) performance of certain loss function in the objective of the general DRO model (see, for example, [33,34,35,36,37]), the objective of (4) involve the risk-aversion performance—mean-HMCR, which is a nonlinear functional with respect to the probability distribution P. The nonliearity of the objective of the inner infinite-dimensional maximization optimization problem gives rise to a challenge, that is, it is difficult to solve the resulting semi-infinite optimization problem obtained by duality theory. Hence, to the best of our knowledge, it is difficult to derive the tractable reformulation of (4). From a statistical point of views, the mixture model in construction of ambiguity set can be treated as a parametric estimation model as the component distributions usually are predetermined (see [38,39]). However, in practice, the component distributions are unknown and can be obtained by historical simulation or expert predication, which motivates us to construct the Wasserstein ambiguity set by a nonparametric method in a data-driven setting in order to derive the computationally tractable reforumulation of (4).
The key contributions of our paper may be summarized as follows:
  • To overcome the difficulties arise from the asymmetry and ambiguity of the underlying distribution, we propose a Wasserstein distributionally robust portfolio model based on kernel smoothing method and mean-HMCR, where the ambiguity set is a Wasserstein “ball” in the finite-dimensional probability distribution space spanned by the weighted KDE;
  • Leveraging Optimal Transport theory, we define KDE–Wasserstein distance by incorporating KDE into Wasserstein distance and prove it enjoys a metric property;
  • Leveraging Fenchel’s duality theory to obtain a finite-dimensional dual problem of the inner maximization problem, we overcome the difficulty arise from the nonliear functional with respect to probability distribution associated with HMCR, and derive the computationally tractability result of the Wasserstein distributionally robust portfolio model based on KDE and mean-HMCR.
  • We extend ϕ -divergence ambiguity set in [40] to Wasserstein ambiguity set by integrating the weight KDE with Optimal Transport theory (see [41]), and we discover that the tractable reformulations of our model involve some difference-of-convex programming (DCP) constraints different from those of [40], which deeply reflects the insights arise from incorporating KDE into Wasserstein distance by optimal transport.
The remainder of this paper is organized as follows. Section 2 formally introduces the Wasserstein distributionally robust portfolio optimization based on KDE and mean-HMCR model, and obtains the computationally tractable reformulation of the corresponding DRO model by Fenchel’s duality theory. Section 3 presents numerical experiment of some empirical studies and sensitivity analysis on model parameters and Section 4 concludes.

2. Wasserstein Distributionally Robust Portfolio Optimization Based on KDE and Mean-HMCR

In this section, we propose a Wasserstein distributionally robust portfolio optimization model based on KDE and mean-HMCR, starting from giving an overview of traditional KDE and its variant—the weighted KDE, introducing the definition of KDE-Wasserstein distance by Optimal Transport theory (see [41]), and proposing new KDE-Wasserstein ambiguity set. Then the tractable reformulation of the corresponding DRO model is derived by Fenchel’s duality theory.
The kernel smoothing method has been a popular tool for the nonparametric estimation of the probability density function (PDF) p ( · ) when the samples X 1 , , X T , which is typically assumed to be drawn independent and identically distributed (i.i.d.) from p ( · ) , are available. A kernel density estimator for a univariate PDF p ( x ) is:
p ^ ( x ) : = 1 T h i = 1 T K x X i h = 1 T i = 1 T K h x X i ,
where the kernel function K ( · ) plays the role of determining the shape of the “bumps” centred at each data point. The smoothing bandwidth h > 0 controls the smoothness of the “bumps” and depends on the sample size T. Three popular assumptions on the kernel function K ( · ) and the bandwidth h = h ( T ) are listed as follows (see [42] for more details of KDE).
Assumption 1.
K ( · ) 0 is bounded, K ( x ) d x = 1 .
Assumption 2.
K ( x ) = K ( x ) and x p K ( x ) d x < , p 1 .
Assumption 3.
h ( T ) 0 and T h ( T ) as T .
Although the Epanechnikov kernel is the optimal kernel function in the sense of the asymptotic mean integrated square error (MISE) with respect to K ( · ) , it has been known that MISE is quite insensitive to the shape of the kernel function (for example, Rectangular, Biweight, Triangular and Gaussian kernel functions). Thus, in practice, one generally chooses the Gaussian kernel for density estimation due to its smoothness. However, the selection of bandwidth h is of crucial importance for the high-performance density estimator. When the Gaussian kernel K ( · ) is available and the estimated p ( · ) is a Gaussian density with variance σ 2 , then h o p t = 1.06 σ T 1 / 5 would be the optimal bandwidth in MISE sense. See [42,43] for comprehensive reviews of the KDE.
Traditional KDE (5) is subject to bias that can mask structure by flattening peaks and filling in troughs in the density (see [44]), which contributes to the modifications to (5) that enjoy reduced bias while at the same time retaining the simple structure of (5). See, for example, the employment of high order kernels [45], variable bandwidth methodologies [46] and data sharpening [47]. An alternative reduced bias method can be obtained by replacing the local weights T 1 in (5) with the global weights, λ = ( λ 1 , , λ T ) Λ and Λ : = { λ R T : 1 T λ = 1 , λ 0 } , that is, the variant of KDE—the weighted KDE with the form:
p ^ λ ( x ) : = i = 1 T λ i K h x X i .
The rationality of the weighted KDE (6) is that the relative importance of different values of X i can be reflected by the “global” weights λ . Moreover, in some statistical applications, the weighted KDE (6) is readily integrated with additional information about p ( · ) by the estimation equation (see [48,49,50] for more details of the weighted KDE). From a DRO point of view, it is important to incorporate the extra distributional information into the ambiguity set constructed by (6) such that the corresponding DRO model is less conservative. Hence the more flexible (6) motivates us to propose a new KDE-Wasserstein ambiguity set (9) by the Optimal Transport (see [41]). We define the following KDE-Wasserstein distance by integrating Wasserstein distance in (3) with the weighted KDE in (6).
Definition 3
(KDE-Wasserstein distance). For the given sample set { X 1 , , X T } , the KDE-Wasserstein distance between univariate distribution p ^ λ and p ^ λ in (6) is defined as follows.
d p ^ λ , p ^ λ : = d K λ , λ = min Π 0 , i = 1 T j = 1 T c i j K π i j | j = 1 T π i j = λ i , i , i = 1 T π i j = λ j , j 1 / 2 ,
where the transport cost c i j K is taken to be the square of 2-Wasserstein distance in (3) between the i-th component K ( i ) ( · ) and the j-th component K ( j ) ( · ) , that is, c i j K : = W 2 2 K ( i ) ( x ) , K ( j ) ( x ) and K ( i ) ( x ) : = K h x X i . In particular, for Gaussian kernel function ϕ ( x ) = 1 / 2 π e x 2 / 2 , the transport cost c i j K in (7) admits a closed expression:
c i j ϕ = X i X j 2 , i , j = 1 , , T .
Let K R be the finite dimensional probability distribution function space spanned by (6).
Proposition 1.
Under Assumptions 1–3, d · , · in (7) is a metric on K R .
Proof. 
It is easy to verify that d p ^ λ , p ^ λ 0 for all p ^ λ , p ^ λ K R and d p ^ λ , p ^ λ = 0 if and only if p ^ λ = p ^ μ . We prove the triangular inequality d p ^ λ , p ^ λ d p ^ λ , p ^ μ + d p ^ μ , p ^ λ for all p ^ λ , p ^ μ , p ^ λ K R . Let Π λ μ ( Π μ λ ) be the solution of problem (7) with marginals p ^ λ , p ^ μ ( p ^ μ , p ^ λ ) and define Π λ λ by π i , k λ λ : = j = 1 T π i j λ μ π j k μ λ μ j . For given i = 1 , , T , we have
k = 1 T π i , k λ λ = k = 1 T j = 1 T π i j λ μ π j k μ λ μ j = j = 1 T π i j λ μ k = 1 T π j k μ λ μ j = j = 1 T π i , j λ λ = λ i .
Similarly, for given k = 1 , , T , we have i = 1 T π i , k λ λ = λ k . Thus, Π λ λ is a joint probability distribution between λ and λ . Therefore,
d p ^ λ , p ^ λ i , k π i k λ λ W 2 K ( i ) , K ( k ) 2 1 / 2 = i , j , k π i j λ μ π j k μ λ μ j W 2 K ( i ) , K ( k ) 2 1 / 2 i , j , k π i j λ μ π j k μ λ μ j W 2 K ( i ) , K ( j ) + W 2 K ( j ) , K ( k ) 2 1 / 2 i , j , k π i j λ μ π j k μ λ μ j W 2 K ( i ) , K ( j ) 2 1 / 2 + i , j , k π i j λ μ π j k μ λ μ j W 2 K ( j ) , K ( k ) 2 1 / 2 = i , j π i j λ μ W 2 K ( i ) , K ( j ) 2 1 / 2 + j , k π j k μ λ W 2 K ( j ) , K ( k ) 2 1 / 2 = d p ^ λ , p ^ μ + d p ^ μ , p ^ λ ,
where the second inequality holds due to the fact W 2 ( · , · ) is a metric, and the third inequality holds from the Minkowski inequality. □
By KDE–Wasserstein distance (7), the KDE-Wasserstein ambiguity set of the univariate random variable X associated with the samples X 1 , , X T is obtained.
Definition 4
(KDE-Wasserstein ambiguity set). For the given sample set { X 1 , , X T } , the KDE-Wasserstein ambiguity set of X is defined as follows.
P d K τ : = p ^ λ = i = 1 T λ i K h x X i | λ Λ d K τ ,
where Λ d K τ : = { λ | λ Λ , d K ( λ , λ 0 ) τ } and λ 0 : = ( 1 / T ) 1 . The radius of the ambiguity set τ 0 controls the degree of the distributional uncertainty.
Due to the two facts: (i) from a statistical point of view, the multivariate KDE for the continuous PDF of the asset returns gives rise to “curse of dimensionality” (see, for example, [42]); (ii) from an optimization point of view, the closed formulation associated with HMCR measure in the objective function of problem (2) can not be obtained by using the multivariate weighted KDE since the multiple integration of positive function with respect to the multivariate weighted KDE is hard to be calculated, we estimate the univariate PDF of the random portfolio loss by using the univariate weighted KDE (6) rather than estimating multivariate PDF of the random asset returns by using the multivariate weighted KDE. For the given sample set { ξ 1 , , ξ T } of random asset returns ξ Ω , which is typically assumed to be drawn i.i.d. from the probability function P of ξ , the corresponding sample set of the portfolio loss { w T ξ 1 , , w T ξ T } is obtained. For any fixed portfolio decision w W , the KDE-Wasserstein ambiguity set of the portfolio loss w T ξ can be derived
P d K τ = p ^ λ = i = 1 T λ i K h x + w T ξ i | λ Λ d K τ ,
where Λ d K τ = { λ | λ Λ , d K ( λ , λ 0 ) τ } and λ 0 = ( 1 / T ) 1 T . When Gaussian kernel function ϕ ( x ) = 1 / 2 π e x 2 / 2 is available, the transport cost c i j K in (10) admits a closed expression by (8):
c i j ϕ = c i j ϕ ( w ) = w T ξ i w T ξ j 2 , i , j = 1 , , T .
By replacing the Wasserstein ambiguity set P in (4) with the KDE-Wasserstein ambiguity set P d K τ in (10), we propose the Wasserstein distributionally robust portfolio optimization model based on KDE and mean-HMCR.
min w W , α R max P P d K τ F ( w , α ; P ) = ρ α + E P w T ξ + ρ 1 γ E P [ w T ξ α ] + p 1 / p , ,
where the confidence level γ [ 0 , 1 ) , risk coefficient ρ > 0 , kernel function K ( · ) , bandwidth h > 0 , the weights λ Λ and p 1 are given. It follows from (10) that the objective function of (12) admits a closed expression. Hence problem (12) is equivalent to:
min w W , α R max λ Λ d K τ F ^ K ( w , α ; λ ) = ρ α i = 1 T λ i w T ξ i + ρ 1 γ i = 1 T λ i Φ p p ( w T ξ i α , h ) 1 / p ,
where Φ p ( x , h ) = ( φ p h ) ( x ) : = h φ p ( h 1 x ) , φ p ( x ) = G p ( x ) 1 / p , G p ( x ) = x ( x t ) p K ( t ) d t for p 1 , ( φ p h ) ( x ) is called a perspective function or an operation of right multiplication (see [51] (pp. 34–35)). Φ p ( x , h ) is jointly convex satisfying Φ p ( x , h ) 0 , and [ x ] + is the recession function ( φ p 0 + ) ( x ) of φ p ( x ) (see [51] (pp. 66–67)), Φ p ( x , h ) = [ x ] + + o ( h ) for x 0 and Φ p ( 0 , h ) = O ( h ) (see [40]).
We prove the tractability result of the Wasserstein distributionally robust portfolio optimization based on the KDE and mean- HMCR γ p model (12), starting from giving the following Lemma 1 in [30]. The relative interior of a set U and { x R N , x > 0 } are denoted by ri ( U ) and R + + N , respectively.
Lemma 1.
([30] (Theorem 1)) Let f : R M × R + N R be a function such that f ( w , · ) is closed and concave for each w R M . Consider a constraint of the form:
f ( w , p ) β , p U ,
where U is such that:
ri ( U ) R + + N ,
Constraint (14) holds for any given w if and only if:
v R N : δ ( v | U ) f ( w , v ) β ,
where δ ( v | U ) : = sup p U v T p , which is the support function of U , and f ( w , v ) : = inf p 0 v T p f ( w , p ) , which is the concave conjugate function of f ( w , p ) with respect to its second argument.
We then prove the tractable reformulation of the Wasserstein distributionally robust portfolio optimization based on KDE and mean- HMCR γ p model (12) with given ξ i , i = 1 , , T , γ [ 0 , 1 ) , ρ > 0 , h > 0 , p > 1 and τ > 0 .
Theorem 1.
Problem (12) is equivalent to the following optimization problem:
min τ 2 ζ + η + β T λ 0 + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y s . t . Φ p ( w T ξ i α , h ) p y p 1 w T ξ i c i j K ( w ) ζ η + β j , i , j = 1 , , T , w W , α R , η R , ζ R + , y R + , β R T .
Furthermore, (15) is formulated as:
min w W , α R , ζ R + , y R + τ 2 ζ + 1 T j = 1 T f j K ( w , α , ζ , y ) + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y ,
where f j K ( w , α , ζ , y ) = max i = 1 , , T Φ p ( w T ξ i α , h ) p y p 1 w T ξ i c i j K ( w ) ζ .
In particular, when Gaussian kernel function ϕ ( x ) = ( 1 / 2 π ) e x 2 / 2 is available, problem (16) is equivalent to the following DCP problem:
min w W , α R , ζ R + , y R + τ 2 ζ + 1 T j = 1 T f j ϕ ( w , α , ζ , y ) + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y ,
where f j ϕ ( w , α , ζ , y ) = max i = 1 , , T Φ p ( w T ξ i α , h ) p y p 1 w T ξ i c i j ϕ ( w ) ζ and c i j ϕ ( w ) = ( w T ξ i w T ξ j ) 2 .
Proof. 
By (13) and introducing the epigraph variable ν R , we have that problem (12) is equivalent to:
min w W , α R , ν R ν , s . t . F ^ K ( w , α ; λ ) ν , λ Λ d K τ .
According to Lemma 1, for any given w , α and ν , the constraint in (18) holds if and only if:
v R T : δ ( v | Λ ϕ τ ) ( F ^ K ) ( w , α ; v ) ν .
In order to obtain the computationally tractable reformulation of problem (12), we need to derive the closed expressions of the support function of the uncertainty set Λ d K τ in (9) and the concave conjugate function of F ^ K ( w , α ; · ) in (19).
On the one hand, let Λ ˜ d K τ be the extended version of Λ d K τ ,
Λ ˜ d K τ : = { λ ˜ | λ ˜ 0 , 1 T λ = 1 , i , j c i j K ( w ) π i j τ 2 , j π i j λ i = 0 , i = 1 , , T , i π i j = λ j 0 , j = 1 , , T } ,
where λ ˜ : = λ T , π 11 , π 12 , , π 1 T , , π T 1 , π T 2 , , π T T T R T ( 1 + T ) . From the duality theorem of linear programming problem, we have:
δ v | Λ ˜ d K τ = max λ 0 , Π 0 v T λ 1 T λ = 1 , i = 1 T j = 1 T c i j K ( w ) π i j τ 2 , j = 1 T π i j λ i = 0 , i = 1 , , T , i = 1 T π i j = λ j 0 , j = 1 , , T , = min η + τ 2 ζ + β T λ 0 η θ i v i , i = 1 , , T , c i j K ( w ) ζ + θ i + β j 0 , i , j = 1 , , T , η R , ζ 0 , θ R T , β R T .
On the other hand, it is obvious to derive that the concave conjugate function of f ( x ) = x p p , x 0 , 0 < p < 1 , has closed formulation f ( y ) = y q q , y 0 , where 1 / p + 1 / q = 1 (see [52]). From [51] (Theorem 16.3) we have the concave conjugate function of F ^ K ( w , α ; · ) in (19):
( F ^ K ) ( w , α ; v ) = sup z 0 ρ α + ρ p ( 1 γ ) p p 1 ( 1 p ) z 1 1 p : z Φ p w T ξ 1 α , h p Φ p w T ξ T α , h p = v 1 + w T ξ 1 v T + w T ξ T .
Replacing 1 / z in (21) with analysis variable s gives rise to:
( F ^ K ) ( w , α ; v ) = sup s 0 ρ α + ρ p ( 1 γ ) p p 1 ( 1 p ) s 1 p 1 : s 1 Φ p w T ξ i α , h p = v i + w T ξ i , i = 1 , , T .
Furthermore, we have
( F ^ K ) ( w , α ; v ) = sup s 0 ρ α + ρ p ( 1 γ ) p p 1 ( 1 p ) s 1 p 1 : s 1 Φ p w T ξ i α , h p v i + w T ξ i , i = 1 , , T . = sup y 0 ρ α + ρ p ( 1 γ ) p p 1 ( 1 p ) y : Φ p w T ξ i α , h p y p 1 v i + w T ξ i , i = 1 , , T ,
where the first equality holds due to the fact that the functions of s in the objective and the left-hand side of the constraints are all decreasing with respect to s, and the second equality holds by replacing s 1 p 1 with y. Combining (20) with (18), it follows from Lemma 1 that:
η + τ 2 ζ + β T λ 0 + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y ν , Φ p ( w T ξ i α , h ) p y p 1 w T ξ i v i , i = 1 , , T , η θ i v i , i = 1 , , T , c i j K ( w ) ζ + θ i + β j 0 , i , j = 1 , , T , w W , α R , ν R , η R , ζ R + , y R + , θ R T , β R T , v R T .
It is easy to verify that the second group of constraints in (22) are jointly convex on w , α , y , v i , i = 1 , , T , which can be solved by geometric programming (see [53]). Eliminating v from (22) , we get:
η + τ 2 ζ + β T λ 0 + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y ν , Φ p ( w T ξ i α , h ) p y p 1 w T ξ i η θ i , i = 1 , , T , c i j K ( w ) ζ + β j θ i , i , j = 1 , , T , w W , α R , η R , ζ R + , y R + , θ R T , β R T , v R T .
Furthermore, eliminating θ from (23), we get:
η + τ 2 ζ + β T λ 0 + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y ν , Φ p ( w T ξ i α , h ) p y p 1 w T ξ i c i j K ( w ) ζ η + β j , i , j = 1 , , T , w W , α R , ν R , η R , ζ R + , y R + , θ R T , β R T , v R T ,
which leads to the tractable reformulation (15) of problem (12).
Denote max i = 1 , , T Φ p ( w T ξ i α , h ) p y p 1 w T ξ i c i j K ( w ) ζ by f j K ( w , α , ζ , y ) , j = 1 , , T , and eliminating η , β from (24), we have:
τ 2 ζ + 1 T j = 1 T f j K ( w , α , ζ , y ) + ρ α ρ p ( 1 γ ) p p 1 ( 1 p ) y ν , w W , α R , ν R , ζ R + , y R + ,
which is the more concise version (16) than (15). When Gaussian kernel function ϕ ( x ) = ( 1 / 2 π ) e x 2 / 2 is available, it follows from (11) that c i j ϕ ( w ) = w T ξ i w T ξ j 2 . Then by the change of variable ζ = 1 / ζ , ζ R + in (15), problem (15) is reformulated as a DCP problem (17) since the quadratic-over-linear function ( w T ξ i w T ξ j ) 2 ζ is jointly convex (see [54,55]), which gives rise to the difference of convex functions between Φ p ( w T ξ i α , h ) p y p 1 w T ξ i and ( w T ξ i w T ξ j ) 2 ζ for f j K ( w , α , ζ , y ) , j = 1 , , T . Thus, the conclusion holds. □
Corollary 1.
For Gaussian kernel ϕ ( · ) and p = t / s , where t , s Z + , t s , problem (17) can be reformulated as the following DCP problem with the geometric mean cone constraints.
min τ 2 / ζ + η + ρ α + β T λ 0 + ρ p ( 1 γ ) p p 1 ( p 1 ) y s . t . ( ( y , , y t s , w T ξ i c i j ϕ ( w ) / ζ + η + β j , , w T ξ i c i j ϕ ( w ) / ζ + η + β j s ) , z i ) C GE t , i , j = 1 , , T , Φ p w T ξ i α , h z i , i = 1 , , T , w W , α R , ζ R + , y R + , η R , β R T , z R T ,
where C GE t = { ( x , y ) R + t × R + : ( i = 1 t x i ) 1 / t y } is the t-dimensional geometric mean cone (see [56]).
Proof. 
For Gaussian kernel ϕ ( · ) and p = t / s , where t , s Z + , t s , the first group of constraints in problem (15) can be reformulated as:
z i t y t s ( w T ξ i c i j ϕ ( w ) / ζ + η + β j ) s , i = 1 , , T , Φ p w T ξ i α , h z i , i = 1 , , T , w W , α R , ζ R + , y R + , η R , β R T , z R T .
Thus, the conclusion holds. □
According to Corollary 1, the tractable reformulations of problem (26) with p = 1 , 2 , 3 are listed as follows.
Example 1.
For p = 1 , problem (26) can be reformulated as the following DCP problem.
min τ 2 / ζ + η + ρ α + β T λ 0 s . t . ρ 1 γ Φ 1 w T ξ i α , h w T ξ i c i j ϕ ( w ) / ζ η β j , i , j = 1 , , T , w W , α R , η R , ζ R + , β R T ,
where c i j ϕ ( w ) = ( w T ξ i w T ξ j ) 2 .
Example 2.
For p = 2 , problem (26) is equivalent to the following smooth optimization problem with the second-order conic constraints.
min τ 2 ζ + η + ρ α + β T λ 0 + 1 4 ρ 1 γ 2 y s . t . z i , w T ξ i c i j ϕ ( w ) ζ + η + β j y 2 , w T ξ i c i j ϕ ( w ) ζ + η + β j + y 2 C SOC 2 , i , j = 1 , , T , Φ 2 w T ξ i α , h z i , i = 1 , , T , w T ξ i c i j ϕ ( w ) ζ + η + β j 0 , i , j = 1 , , T , w W , α R , ζ R + , y R + , η R , β R T , z R T ,
where c i j ϕ ( w ) = ( w T ξ i w T ξ j ) 2 and C SOC 2 = { ( x , y ) R 2 × R : x 2 y } is the 2-dimensional second-order cone (see [57]).
Example 3.
For p = 3 , problem (26) is equivalent to the following smooth optimization problem with the geometric mean conic constraints.
min τ 2 ζ + η + ρ α + β T λ 0 + 2 ( 1 3 ) 3 2 ρ 1 γ 3 2 y s . t . ( ( y , y , w T ξ i c i j ϕ ( w ) ζ + η + β j ) , z i ) C GE 3 , i , j = 1 , , T , Φ 3 w T ξ i α , h z i , i = 1 , , T , w W , α R , ζ R + , y R + , η R , β R T , z R T ,
where c i j ϕ ( w ) = ( w T ξ i w T ξ j ) 2 and C GE 3 = { ( x , y ) R + 3 × R + : ( x 1 x 2 x 3 ) 1 / 3 y } is the 3-dimensional geometric mean cone (see [56]).
Remark 1.
When τ = 0 , problem (12) is equivalent to
min w W , α R F ^ K ( w , α ; λ 0 ) = ρ α 1 T i = 1 T w T ξ i + ρ ( 1 γ ) T 1 / p i = 1 T Φ p p ( w T ξ i α , h ) 1 / p ,
which is called the nominal portfolio optimization problem based on KDE and mean-HMCR. Furthermore, when τ = 0 and h = 0 , then problem (12) is equivalent to the following SAA version of problem (2) (see [11]):
min w W , α R F ˜ T ( w , α ) : = ρ α 1 T i = 1 T w T ξ i + ρ ( 1 γ ) T 1 / p i = 1 T [ w T ξ i α ] + p 1 / p .
Remark 2.
It is known from the statistics perspective that the bandwidth selection is a key issue for kernel density estimation. The Rule-of-Thumb bandwidth h = 1.06 σ ^ T 1 / 5 , where σ ^ is the sample standard deviation, is a popular method (see [58,59] for more details of bandwidth selection). Note that the PDF of the portfolio loss is estimated by the weighted KDE in (6). Then, the corresponding Rule-of-Thumb bandwidth admits a closed expression with the form:
h = h ( w ) = c M R w ,
where c = 1.06 T 1 / 5 / T 1 , M = I ( 1 / T ) 1 1 T R T × T (I is the identity matrix) and R = [ ξ 1 , , ξ T ] T (see [40]).

3. Numerical Studies

We conduct numerical experiments on the portfolio selection problem in order to demonstrate the effectiveness of the Wasserstein distributionally robust portfolio optimization model based on KDE and mean- HMCR γ p (12) (denoted “ WRKH p , p = 1 , 2 , 3 ”) with ϕ ( x ) = ( 1 / 2 π ) e x 2 / 2 and h = c M R w listed as follows.
  • The Wasserstein distributionally robust portfolio optimization model based on KDE and mean- HMCR γ 1 (denoted “ WRKH 1 ”)
    min w W , α R , ζ R + τ 2 ζ + ρ α + 1 T j = 1 T max i = 1 , , T ρ 1 γ Φ 1 w T ξ i α , c M R w w T ξ i ( w T ξ i w T ξ j ) 2 ζ .
  • The Wasserstein distributionally robust portfolio optimization model based on KDE and mean- HMCR γ 2 ”)
    min w W , α R , ζ R + , y R + τ 2 ζ + ρ α + 1 4 ρ 1 γ 2 y + 1 T j = 1 T max i = 1 , , T Φ 2 2 ( w T ξ i α , c M R w ) y w T ξ i ( w T ξ i w T ξ j ) 2 ζ .
  • The Wasserstein distributionally robust portfolio optimization model based on KDE and mean- HMCR γ 3 ”):
    min w W , α R , ζ R + , y R + τ 2 ζ + ρ α + 2 1 3 3 2 ρ 1 γ 3 2 y + 1 T j = 1 T max i = 1 , , T Φ 3 3 ( w T ξ i α , c M R w ) y 2 w T ξ i ( w T ξ i w T ξ j ) 2 ζ .
Our numerical experiments consist of two parts: (i) rolling horizon analysis of “ WRKH p , p = 1 , 2 , 3 ” compared with the SAA based mean- HMCR γ p , p = 1 , 2 , 3 portfolio optimization model (30) (denoted “ SAAH p , p = 1 , 2 , 3 ”) and (ii) sensitivity analysis of “ WRKH p , p = 1 , 2 , 3 ” on τ and ρ . To ensure the reproducibility of the numerical experiment, the popular academic benchmarks called Fama and French (FF) datasets are used, which is public and is readily available to anyone (accessed on 15 October 2021 and http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). We use four datasets—FF6, FF30, FF48 and FF100—that contain 750 daily returns that span from 9 October 2018 to 30 September 2021. All results were produced on a PC (Inter®Core™i5-4590, 3.30GHz, 4.00GB), using “fmincon” in MATLAB R2014b.

3.1. Rolling Horizon Analysis

To demonstrate the effectiveness of “ WRKH p , p = 1 , 2 , 3 ”, we conduct the rolling horizon experiment in a setting similar to that of [60]. The historical data of the asset returns in the previous T days is used to obtain the optimal portfolio weights by solving problem (32)–(34) and problems (30) with p = 1 , 2 , 3 . We then use the portfolio weights to compute the current portfolio returns by using the current asset returns. This process is continued by adding the returns data for the next period in the dataset and dropping the earliest returns data, until the end of the dataset is reached (see [40,60]). Comparisons with “ SAAH p , p = 1 , 2 , 3 ” are also given by rolling horizon analysis. We set T = 500 , γ = 0.95 , λ 0 = ( 1 / T ) 1 , ρ = 10 and τ = 0.01 . For four datasets, the corresponding evolutions of the accumulated wealth of the six strategies (“ WRKH p ” and “ SAAH p ” with p = 1 , 2 , 3 ) are plotted in Figure 1.
Four performance criteria are adopted to assess the quality of a portfolio selection strategy (see [40]): (i) the mean of the daily returns (denoted “MR”), (ii) the Sharpe ratio (denoted “SR”), (iii) the average turnover (denoted “TO”), (iv) the maximum drawdown (denoted “MDD”). The turnover indicates the volumes of rebalancing. There is no benefit for the portfolio strategies that yield a high turnover since a high “TO” incurs a high trading cost that reduces the portfolio return (see [35,60]). The maximum drawdown is an indicator of downside risk over a specified time period, which is the largest drop from a peak (see [61]). The performance, in the above four criteria, of the six strategies for four datasets is shown in Table 1. The better-performing strategy under different criteria is marked in bold fonts for p = 1 , 2 , 3 , respectively.
From Figure 1 and Table 1, we can observe that: (i) For FF6, the wealthy curve of “ WRKH 1 ” dominates those of the others most of the time. “MR”, “SR” and “MDD” of “ WRKH p ” are better than those of “ SAAH p ” for p = 1 , 2 , 3 , respectively. (ii) For FF30, the wealthy curves of “ WRKH p , p = 2 , 3 ” overlaps with those of “ SAAH p , p = 2 , 3 ”, and dominate those of “ WRKH 1 ” and “ SAAH 1 ” most of the time. “ WRKH 2 ” has the optimal “MR” and “SR”. (iii) For FF48, the wealthy curve of “ WRKH 2 ” overlaps with those of the others at the first half time, and dominates those of the others latter. “MR”, “SR” and “MDD” of “ WRKH p ” are better than those of “ SAAH p ” for p = 1 , 2 , 3 , respectively. (iv) For FF100, the wealthy curve of “ WRKH 1 ” dominates those of the others most of the time. “MR”, “SR” and “MDD” of “ WRKH p ” are better than those of “ SAAH p ” for p = 1 , 3 , respectively.

3.2. Sensitivity Analysis

The numerical experiments of the sensitivity analysis of “ WRKH p , p = 1 , 2 , 3 ” on τ and ρ have also been conducted for FF30. There are similar results for the other datasets. The corresponding numerical results for FF30 are listed as follows.
Sensitivity on τ . For τ = 0.01 , 0.03 , 0.05 , 0.07 , 0.09 , the wealthy curves of “ WRKH p , p = 1 , 2 , 3 ” with ρ = 10 are shown in Figure 2a–c. The corresponding statistic of the performance of “ WRKH p , p = 1 , 2 , 3 ” is shown in Table 2.
From Figure 2a–c and Table 2, we can observe that: (i) For p = 2 , 3 , the wealthy curve of “ τ = 0.05 ” dominates those of the others most of the time. “ τ = 0.05 ” has the optimal “MR”, “SR” and “MDD”. (ii) For p = 1 , the wealthy curve of “ τ = 0.07 ” dominates those of the others most of the time. “ τ = 0.07 ” has the optimal “MR”, “SR” and “MDD”. It is suggested that for the good performance of “ WRKH p , p = 1 , 2 , 3 ” the value of τ should not be too big or too small.
Sensitivity on ρ . For τ = 1 , 5 , 10 , 15 , 20 , the wealthy curves of “ WRKH p , p = 1 , 2 , 3 ” with τ = 0.01 are shown in Figure 2d–f. The corresponding statistic of the performance of “ WRKH p , p = 1 , 2 , 3 ” is shown in Table 3.
From Figure 2d–f and Table 3, we can observe that: (i) For p = 1 , 3 , the wealthy curve of “ ρ = 10 ” dominates those of the others most of the time. “MR” and “SR” of “ ρ = 10 ” are optimal. (ii) For p = 2 , all wealthy curves are intertwined most of the time. “ ρ = 5 ” has the optimal “MR”, however “ ρ = 20 ” has the optimal “SR” and “MDD”. It is suggested from the numerical results that for the good performance of “ WRKH p , p = 1 , 2 , 3 ” the risk aversion level ρ should not be too big or too small.

4. Conclusions

We propose the Wasserstein distributionally robust portfolio optimization model based on a kernel smoothing method and mean-HMCR to solve three challenges arising from the asymmetry and nonlinearity of HMCR as well as the ambiguity of asset returns distribution. We obtain the tractable DCP reformulation of the corresponding DRO model. The experimental results show that our proposed model is promising. It remains open to calibrate KDE-Wasserstein radius τ , which is left for future research.

Author Contributions

Conceptualization: W.L.; Methodology: W.L.; Formal Analysis: W.L.; Supervision: W.L.; Investigation: Y.L.; Software: Y.L.; Resources: Y.L.; Visualization: Y.L.; Writing—Original Draft: W.L.; Writing—Review and Editing: Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Teacher Research Capacity Promotion Program of Beijing Normal University at Zhuhai.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html (accessed on 15 October 2021).

Acknowledgments

The authors wish to thank the anonymous reviewers and the Editors, whose insightful comments and helpful suggestions significantly contributed to improving this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wiesemann, W.; Kuhn, D.; Sim, M. Distributionally robust convex optimization. Oper. Res. 2014, 62, 1358–1376. [Google Scholar] [CrossRef] [Green Version]
  2. Knight, F.H. Risk, Uncertainty and Profit, 1st ed.; Hart, Schaffner and Marx: Boston, MA, USA, 1921. [Google Scholar]
  3. Markowitz, H. Portfolio selection. J. Financ. 1952, 7, 77–91. [Google Scholar]
  4. Krokhmal, P.A. Higher moment coherent risk measures. Quant. Financ. 2007, 7, 373–387. [Google Scholar] [CrossRef]
  5. Krokhmal, P.A.; Zabarankin, M.; Uryasev, S. Modeling and optimization of risk. Surv. Oper. Res. Manag. Sci. 2011, 16, 49–66. [Google Scholar] [CrossRef]
  6. Rockafellar, R.T.; Uryasev, S. Optimization of conditional value-at-risk. J. Risk 2000, 2, 21–42. [Google Scholar] [CrossRef] [Green Version]
  7. Artzner, P.; Delbaen, F.; Eber, J.M.; Heath, D. Coherent measures of risk. Math. Financ. 1999, 9, 203–228. [Google Scholar] [CrossRef]
  8. Fishburn, P.C. Mean-risk analysis with risk associated with below-target returns. Am. Econ. Rev. 1977, 67, 116–126. [Google Scholar]
  9. Rockafellar, R.T.; Uryasev, S.; Zabarankin, M. Generalized deviations in risk analysis. Financ. Stoch. 2006, 10, 51–74. [Google Scholar] [CrossRef]
  10. Fabozzi, F.J.; Huang, D.; Zhou, G. Robust portfolios: Contributions from operations research and finance. Ann. Oper. Res. 2010, 176, 191–220. [Google Scholar] [CrossRef] [Green Version]
  11. Shapiro, A.; Dentcheva, D.; Ruszczyński, A. Lectures on Stochastic Programming: Modeling and Theory, 2nd ed.; SIAM: Philadelphia, PA, USA, 2014. [Google Scholar]
  12. Ben-Tal, A.; Nemirovski, A. Robust Convex Optimization. Math. Oper. Res. 1998, 23, 769–805. [Google Scholar] [CrossRef] [Green Version]
  13. Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A. Robust Optimization; Princeton University Press: Princeton, NJ, USA, 2009; Volume 28. [Google Scholar]
  14. Bertsimas, D.; Sim, M. The Price of Robustness. Oper. Res. 2004, 52, 35–53. [Google Scholar] [CrossRef] [Green Version]
  15. Scarf, H. A min-max solution of an inventory problem. In Studies in the Mathematical Theory of Inventory and Production; Scarf, H., Arrow, K., Karlin, S., Eds.; Stanford University Press: Stanford, CA, USA, 1958; Volume 10, pp. 201–209. [Google Scholar]
  16. El Ghaoui, L.; Oks, M.; Oustry, F. Worst-case value-at-risk and robust portfolio optimization: A conic programming approach. Oper. Res. 2003, 51, 543–556. [Google Scholar] [CrossRef] [Green Version]
  17. Bertsimas, D.; Popescu, I. Optimal Inequalities in Probability Theory: A Convex Optimization Approach. SIAM J. Optim. 2005, 15, 780–804. [Google Scholar] [CrossRef]
  18. Popescu, I. Robust mean-covariance solutions for stochastic optimization. Oper. Res. 2007, 55, 98–112. [Google Scholar] [CrossRef] [Green Version]
  19. Goh, J.; Sim, M. Distributionally robust optimization and its tractable approximations. Oper. Res. 2010, 58, 902–917. [Google Scholar] [CrossRef]
  20. Delage, E.; Ye, Y. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 2010, 58, 595–612. [Google Scholar] [CrossRef] [Green Version]
  21. Zymler, S.; Kuhn, D.; Rustem, B. Distributionally robust joint chance constraints with second-order moment information. Math. Program. 2013, 137, 167–198. [Google Scholar] [CrossRef] [Green Version]
  22. Hanasusanto, G.A.; Kuhn, D. Robust Data-Driven Dynamic Programming. In Advances in Neural Information Processing Systems 26; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 827–835. Available online: https://proceedings.neurips.cc/paper/2013/hash/ef575e8837d065a1683c022d2077d342-Abstract.html (accessed on 15 October 2021).
  23. Erdoğan, E.; Iyengar, G. Ambiguous chance constrained problems and robust optimization. Math. Program. 2006, 107, 37–61. [Google Scholar] [CrossRef] [Green Version]
  24. Bertsimas, D.; Gupta, V.; Kallus, N. Robust sample average approximation. Math. Program. 2018, 171, 217–282. [Google Scholar] [CrossRef] [Green Version]
  25. Wang, Z.; Glynn, P.W.; Ye, Y. Likelihood robust optimization for data-driven problems. Comput. Manag. Sci. 2016, 13, 241–261. [Google Scholar] [CrossRef] [Green Version]
  26. Ben-Tal, A.; Den Hertog, D.; De Waegenaere, A.; Melenberg, B.; Rennen, G. Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 2013, 59, 341–357. [Google Scholar] [CrossRef] [Green Version]
  27. Bayraksan, G.; Love, D. Data-driven stochastic programming using phi-divergences. Tutor. Oper. Res. 2015, 1, 1–19. [Google Scholar]
  28. Jiang, R.; Guan, Y. Data-driven chance constrained stochastic program. Math. Program. 2016, 158, 291–327. [Google Scholar] [CrossRef]
  29. Shapiro, A. Distributionally robust stochastic programming. SIAM J. Optim. 2017, 27, 2258–2275. [Google Scholar] [CrossRef]
  30. Postek, K.; Hertog den, D.; Melenberg, B. Computationally tractable counterparts of distributionally robust constraints on risk measures. SIAM Rev. 2016, 58, 603–650. [Google Scholar] [CrossRef] [Green Version]
  31. Calafiore, G.C. Ambiguous risk measures and optimal robust portfolios. SIAM J. Optim. 2007, 18, 853–877. [Google Scholar] [CrossRef]
  32. Hu, Z.; Hong, L.J. Kullback-Leibler Divergence Constrained Distributionally Robust Optimization. 2012. Available online: http://www.optimization-online.org/DB_HTML/2012/11/3677.html (accessed on 15 October 2021).
  33. Mohajerin Esfahani, P.; Kuhn, D. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
  34. Gao, R.; Kleywegt, A.J. Distributionally Robust Stochastic Optimization with Wasserstein Distance. arXiv 2016, arXiv:1604.02199v2. [Google Scholar]
  35. Wozabal, D. Robustifying convex risk measures for linear portfolios: A nonparametric approach. Oper. Res. 2014, 62, 1302–1315. [Google Scholar] [CrossRef]
  36. Zhao, C.; Guan, Y. Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 2018, 46, 262–267. [Google Scholar] [CrossRef]
  37. Mei, Y.; Chen, Z.P.; Ji, B.B.; Xu, Z.J.; Liu, J. Data-driven Stochastic Programming with Distributionally Robust Constraints Under Wasserstein Distance: Asymptotic Properties. J. Oper. Res. Soc. China 2021, 9, 525–542. [Google Scholar] [CrossRef]
  38. Nakagawa, K.; Ito, K. Taming Tail Risk: Regularized Multiple β Worst-Case CVaR Portfolio. Symmetry 2021, 13, 922. [Google Scholar] [CrossRef]
  39. Zhu, S.; Fukushima, M. Worst-case conditional value-at-risk with application to robust portfolio management. Oper. Res. 2009, 57, 1155–1168. [Google Scholar] [CrossRef]
  40. Liu, W.; Yang, L.; Yu, B. KDE distributionally robust portfolio optimization with higher moment coherent risk. Ann. Oper. Res. 2021, 307, 363–397. [Google Scholar] [CrossRef]
  41. Villani, C. Optimal Transport: Old and New; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  42. Silverman, B.W. Density Estimation for Statistics and Data Analysis, 2nd ed.; MPS-SIAM Series on Optimization; Chapman and Hall: New York, NY, USA, 1986. [Google Scholar]
  43. Li, Q.; Racine, J.S. Nonparametric Econometrics: Theory and Practice; Princeton University Press: Princeton, NJ, USA, 2007. [Google Scholar]
  44. Hazelton, M.L.; Turlach, B.A. Reweighted kernel density estimation. Comput. Stat. Data Anal. 2007, 51, 3057–3069. [Google Scholar] [CrossRef]
  45. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  46. Terrell, G.R.; Scott, D.W. Variable Kernel Density Estimation. Ann. Stat. 1992, 20, 1236–1265. [Google Scholar] [CrossRef]
  47. Hall, P.; Minnotte, M.C. High order data sharpening for density estimation. J. R. Stat. Soc. Ser. B 2002, 64, 141–157. [Google Scholar] [CrossRef]
  48. Hall, P.; Turlach, B.A. Reducing bias in curve estimation by use of weights. Comput. Stat. Data Anal. 1999, 30, 67–86. [Google Scholar] [CrossRef]
  49. Owen, A.B. Empirical Likelihood; CRC Press: Boca Raton, FL, USA, 2001. [Google Scholar]
  50. Chen, S.X. Empilical likelihood-based kernel density estimation. Aust. J. Stat. 1997, 39, 47–56. [Google Scholar] [CrossRef]
  51. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1997. [Google Scholar]
  52. Ben-Tal, A.; Den Hertog, D.; Vial, J.P. Deriving robust counterparts of nonlinear uncertain inequalities. Math. Program. 2015, 149, 265–299. [Google Scholar] [CrossRef]
  53. Boyd, S.; Kim, S.; Vandenberghe, L.; Hassibi, A. A tutorial on geometric programming. Optim. Eng. 2007, 8, 67–127. [Google Scholar] [CrossRef]
  54. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  55. Horst, R.; Thoai, N. DC Programming: Overview. J. Optim. Theory Appl. 1999, 103, 1–43. [Google Scholar] [CrossRef]
  56. Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, version 2.0 beta; September 2013. Available online: http://cvxr.com/cvx/citing/ (accessed on 15 October 2021).
  57. Lobo, M.S.; Vandenberghe, L.; Boyd, S.; Lebret, H. Applications of second-order cone programming. Linear Algebra Appl. 1998, 284, 193–228. [Google Scholar] [CrossRef] [Green Version]
  58. Izenman, A.J. Recent developments in nonparametric density estimation. J. Am. Stat. Assoc. 1991, 86, 205–224. [Google Scholar] [CrossRef]
  59. Jones, M.C.; Marron, J.S.; Sheather, S.J. A Brief Survey of Bandwidth Selection for Density Estimation. J. Am. Stat. Assoc. 1996, 91, 401–407. [Google Scholar] [CrossRef]
  60. DeMiguel, V.; Garlappi, L.; Uppal, R. Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? Rev. Financ. Stud. 2009, 22, 1915–1953. [Google Scholar] [CrossRef] [Green Version]
  61. Chekhlov, A.; Uryasev, S.; Zabarankin, M. Drawdown measure in portfolio optimization. Int. J. Theor. Appl. Financ. 2005, 8, 13–58. [Google Scholar] [CrossRef]
Figure 1. Wealth curves of “ WRKH p ” and “ SAAH p ” ( p = 1 , 2 , 3 ) for four datasets.
Figure 1. Wealth curves of “ WRKH p ” and “ SAAH p ” ( p = 1 , 2 , 3 ) for four datasets.
Symmetry 14 00138 g001
Figure 2. Wealth curves of “ WRKH p , p = 1 , 2 , 3 ” for various values of τ (ac) and ρ (df).
Figure 2. Wealth curves of “ WRKH p , p = 1 , 2 , 3 ” for various values of τ (ac) and ρ (df).
Symmetry 14 00138 g002
Table 1. Performance of “ WRKH p ” and “ SAAH p ” ( p = 1 , 2 , 3 ) for four datasets.
Table 1. Performance of “ WRKH p ” and “ SAAH p ” ( p = 1 , 2 , 3 ) for four datasets.
DatasetStatistics WRKH 1 SAAH 1 WRKH 2 SAAH 2 WRKH 3 SAAH 3
FF6MR0.0011970.0010900.0009550.0009270.0010150.000938
SR0.13230.12350.10390.10150.11370.1028
TO0.05610.03950.224500.03670.0104
MDD0.07260.07270.08550.08580.08020.0858
FF30MR0.0005760.0005880.0014350.0014000.0013660.001406
SR0.08070.08280.15400.14440.14660.1451
TO0.18550.09200.47100.11770.13190.1182
MDD0.06950.06900.07630.08350.07820.0838
FF48MR0.0004710.0003360.0006720.0005110.0005150.000471
SR0.06460.04420.06490.05270.05330.0487
TO0.14660.04630.27310.04080.04080.1332
MDD0.06930.08220.08560.08730.08720.0950
FF100MR0.0013490.0011840.0003500.0009440.0010030.000940
SR0.14420.13190.02210.05160.09810.0514
TO0.16630.09860.55700.04550.03590.0539
MDD0.06540.06970.21250.24820.08240.2482
Table 2. Sensitivity of “ WRKH p , p = 1 , 2 , 3 ” on τ .
Table 2. Sensitivity of “ WRKH p , p = 1 , 2 , 3 ” on τ .
pStatistics τ
0.010.030.050.070.09
1MR0.0011970.0011210.0013350.0014000.001301
SR0.13230.12700.14390.15010.1402
TO0.05610.04330.09250.08350.1046
MDD0.07260.07490.06930.06650.0707
2MR0.0009550.0012580.0012830.0012380.001098
SR0.10390.14150.14350.13900.1228
TO0.22450.09070.06760.08240.1359
MDD0.08550.06930.06840.06960.0779
3MR0.0010160.0012200.0012360.0011680.001115
SR0.11370.13800.13940.13220.1262
TO0.03670.06060.05770.05130.0451
MDD0.08020.06920.06910.07100.0748
Table 3. Sensitivity of “ WRKH p , p = 1 , 2 , 3 ” on ρ .
Table 3. Sensitivity of “ WRKH p , p = 1 , 2 , 3 ” on ρ .
pStatistics ρ
15101520
1MR0.0005930.0006380.0006700.0006620.000576
SR0.08450.09050.09580.09450.0808
TO0.05870.11550.15830.11160.1855
MDD0.07030.07100.06980.06970.0695
2MR0.0011260.0015880.0014350.0011330.001512
SR0.11160.15090.15400.12120.1578
TO0.28830.60780.47100.56280.4568
MDD0.08090.09480.07630.09020.0704
3MR0.0012040.0013070.0013660.0012910.001300
SR0.13590.14000.14660.14150.1396
TO0.12630.13160.13190.16280.1417
MDD0.07100.08130.07820.07930.0818
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, W.; Liu, Y. Worst-Case Higher Moment Coherent Risk Based on Optimal Transport with Application to Distributionally Robust Portfolio Optimization. Symmetry 2022, 14, 138. https://doi.org/10.3390/sym14010138

AMA Style

Liu W, Liu Y. Worst-Case Higher Moment Coherent Risk Based on Optimal Transport with Application to Distributionally Robust Portfolio Optimization. Symmetry. 2022; 14(1):138. https://doi.org/10.3390/sym14010138

Chicago/Turabian Style

Liu, Wei, and Yang Liu. 2022. "Worst-Case Higher Moment Coherent Risk Based on Optimal Transport with Application to Distributionally Robust Portfolio Optimization" Symmetry 14, no. 1: 138. https://doi.org/10.3390/sym14010138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop