Stochastic Comparisons of Some Distances between Random Variables

The aim of this paper is twofold. First, we show that the expectation of the absolute value of the difference between two copies, not necessarily independent, of a random variable is a measure of its variability in the sense of Bickel and Lehmann (1979). Moreover, if the two copies are negatively dependent through stochastic ordering, this measure is subadditive. The second purpose of this paper is to provide sufficient conditions for comparing several distances between pairs of random variables (with possibly different distribution functions) in terms of various stochastic orderings. Applications in actuarial and financial risk management are given.


Introduction
Given a bivariate random vector X = (X 1 , X 2 ) with joint distribution function K X (x 1 , x 2 ) = P(X 1 ≤ x 1 , X 2 ≤ x 2 ) and marginal distribution functions F 1 (x) = P[X 1 ≤ x] and F 2 (x) = P[X 2 ≤ x], the random variable |X 1 − X 2 | describes the distance between X 1 and X 2 in a sense that depends on the dependence structure of the vector. Different structures assign different meanings to this random variable and lead, obviously, to different ways of computing the expectation E(|X 1 − X 2 |). The distance can be applied to random variables with identical and non-identical distribution functions, and we consider both cases. If F 1 = F 2 = F, then X 1 and X 2 are copies of the same random variable X with distribution function F(x), and E(|X 1 − X 2 |) reveals information about X. An example is the case of independent and identically distributed random variables, in which E(|X 1 − X 2 |) is the Gini's mean difference of X 1 , a well-known measure of variability (see, for example, [1]). We show in this work that, when X 1 and X 2 are dependent copies of the same random variable X, E(|X 1 − X 2 |) is still a measure of variability of X. A purpose of this paper is to study the properties of this functional in a general setting, where X 1 and X 2 are not necessarily independent.
If X 1 and X 2 are independent (or, more generally, if they are linked by a symmetric dependence structure), |X 1 − X 2 | treats symmetrically the events X 1 < X 2 and X 2 < X 1 . However, sometimes, it is convenient to use a characteristic of proximity by treating them differently (in finance, for example, an investor evaluates differently gains and losses). The random excess of X 1 over X 2 , (X 1 − X 2 ) + , where x + = max{x, 0} denotes the positive part of x, is useful if we are interested in measuring the extent to which one random variable exceeds the other, rather than the distance between them in a bidirectional sense. Note that the absolute value |X 1 − X 2 | can be split into two terms, each describing the excess of one random variable over the other, as follows: If X 1 and X 2 are copies of the same variable X, then E((X 1 − X 2 ) + ) also reveals information about X. For example, if X 1 and X 2 are independent, E((X 1 − X 2 ) + ) is Gini's mean semidifference.
In general, the functional E(φ(|X 1 − X 2 |)), where φ is a non-negative real function, has been largely studied in mathematics literature, mainly in the context of the Monge-Kantorovich problem (see [2], and references therein). The interest on this and other functionals used to measure the degree of difference between two random quantities goes back at least to the 1930s and the important contributions by Gini (see [3]) and Hoeffding [4]. Given two random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ), another purpose of this paper is to find conditions under which where Ω is a subset of increasing real functions. Different choices of Ω give rise to different stochastic orderings between |X 1 − X 2 | and |Y 1 − Y 2 |. This problem was addressed in [5][6][7] for the case where X and Y have independent components with the same marginal distribution functions (see Section 2 below for details). Here, we are concerned with two random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ), whose components are not necessarily independent nor are they required to have identical distribution functions. In this case, we explore conditions under which where ≤ st and ≤ icx are the usual stochastic order and the increasing convex order, respectively (these orders will be defined in Section 2 below). This work is organized as follows. Section 2 contains preliminaries, such as definitions and background for the stochastic orders and dependence notions used in this paper, as well as a review of the properties that a variability measure should satisfy. In Section 3, given a random variable X with distribution function F, we show that any functional of the form ν(X) = E(|X 1 − X 2 |), where X 1 and X 2 are two copies of X with any type of dependence structure, is a measure of variability of X. More generally, the distribution function F 1 of X 1 is allowed to be a distortion of F (we will explain the meaning of this below). In Section 4, given two random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ), we obtain conditions (both in terms of the marginals and the copulas) to make comparisons of the form (2). Section 5 contains two applications. In Section 5.1, we define a general class of premium principles based on the class of variability measures studied in Section 3. In Section 5.2, in the context of portfolio risk management, we assess the inclusion of a new asset in a portfolio by using the results obtained in Section 4. Finally, Section 6 contains conclusions.
Throughout this paper, given two random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ), we denote by F 1 , F 2 and G 1 , G 2 the respective marginal distribution functions. Given any other random variable Z, we denote by F Z its distribution function.

Preliminaries
Let X = (X 1 , X 2 ) be a random vector with joint distribution function K X and marginal distribution functions F 1 and F 2 , respectively. According to the Sklar theorem, the joint distribution K X can be written as where C is the copula of the random vector (X 1 , X 2 ), that is, the joint distribution function of the vector-copula (F 1 (X 1 ), F 2 (X 2 )) (see [8]). If F 1 and F 2 are continuous, then C is unique. The copula contains the information about the structure of dependency of the random vector (X 1 , X 2 ). For every copula C and every (u, v) on [0, 1] 2 , it is well-known that where the copulas W(u, v) = max(u + v − 1, 0) and M(u, v) = min(u, v) are the Fréchet-Hoeffding bounds. Random variables with copula M are called comonotonic and random variables with copula W are called countermonotonic. The motivation for the study of the properties of E(|X 1 − X 2 |), where X 1 and X 2 are not necessarily independent, comes from the fact that some probability metrics and measures of variability that are sometimes better known under other expressions, take this form for different copulas between X 1 and X 2 . To give some examples, note that If X 1 and X 2 are two copies of a random variable X with distribution function F(x), (4) becomes If X 1 and X 2 are two independent copies of X, then is the Gini's mean difference (GMD) of X, a well-known index of variability (see, for example, [1]). If X 1 and X 2 are comonotonic, then (see [9] or [10]) which is the Wasserstein distance, a well-known characteristic of proximity of two random variables (see [11]). If X 1 and X 2 are countermonotonic, then (see, for example, [2]). It is easy to see that, if X 1 and X 2 are two copies of X, (7) can be rewritten as where m X is the median of X. This measure is twice the median absolute deviation (MAD), another popular measure of variability. In view of the above examples, it is natural to ask whether E(|X 1 − X 2 |), where X 1 and X 2 are two copies of X with a copula C, fulfills the requirements to be considered as a measure of variability of X. Recall that a measure of variability ν is a map from the set of random variables to R, such that given a random variable X, ν(X) quantifies the variability of X. Next, we list a set of properties that a measure of variability should reasonably satisfy (see, for example, [12] and references therein): (P0) Law invariance: if X and Y have the same distribution, then ν(X) = ν(Y). (P1) Translation invariance: ν(X + k) = ν(X) for all X and all constant k. (P2) Positive homogeneity: ν(0) = 0 and ν(λX) = λν(X) for all X and all λ > 0. (P3) Non-negativity: ν(X) ≥ 0 for all X, with ν(X) = 0 if X is degenerated at c ∈ R.
Bickel and Lehmann [13] also require ν(X) to be consistent with the dispersive order. Recall that two random variables X and Y are ordered in the dispersive order if the difference between any two quantiles of X is smaller than the corresponding quantiles of Y, where the quantile function of a random variable X with distribution function F is defined by F −1 (α) = inf{x : F(x) ≥ α}, α ∈ (0, 1). The formal definition is as follows. Definition 1. Given two random variables X and Y with distribution functions F and G, respectively, we say that X is smaller than Y in the dispersive order (denoted by A functional ν satisfying properties (P0) to (P3) is said to be a measure of variability or spread in the sense of Bickel and Lehmann if it satisfies in addition (see [13]): A measure of variability in the sense of Bickel and Lehmann considers the variability or spread of a random variable throughout its distribution. Sometimes, however, there is an interest in measuring only the variability of X along the right tail of its distribution (in risk theory, for example, some popular measures focus on the variability of a risk X beyond the value at risk). When this is the case, the requirement on ν to be consistent with the dispersive order is too strong. A natural weaker requirement is to be consistent with the excess wealth order (see [14]), which is defined as follows.

Definition 2.
Given two random variables X and Y with distribution functions F and G, respectively, we say that X is smaller than Y in the excess wealth order (denoted by This allows us to consider the following property. Measures of variability have received great attention in the actuarial and financial literature (see [12,[15][16][17][18], among others). In actuarial science, for example, a variability measure sometimes is combined with a location measure to build a premium principle (see [19]). For particular applications in this context, we may wish ν to satisfy the following properties: Furman et al. ( [12]) say that ν is a coherent measure of variability if it satisfies (P0)-(P3) and (P7).
Next, we recall some other notions used in this paper. The sequence of inequalities (3) induces the following definition (see [8]).

Definition 3.
Given two copulas C and C , we say that C is smaller than C in the concordance order (and write C ≺ C ) if C(u, v) ≤ C (u, v) for all u, v ∈ (0, 1).
Obviously, W ≺ C ≺ M for every copula C. The name of this order is due to the fact that some measures of concordance, such as Kendall's tau and Spearman's rho, are increasing with respect to ≺ .
In Sections 4 and 5, we will make use of the following stochastic orders. The reader may consult the books [20][21][22] for properties and applications. Definition 4. Let X and Y be two random variables with distribution functions F and G and finite expectations µ X and µ Y , respectively. Then, X is said to be smaller than Y : It can be shown that X ≤ st Y (respectively ≤ cx , ≤ icx , ≤ icv ) if and only if E(φ(X)) ≤ E(φ(Y)) for all increasing (respectively convex, increasing convex, increasing concave) functions φ for which the expectations exist. When X 2 and Y 2 are independent copies of X 1 and Y 1 , respectively, it is well-known (see [5] and [6]) that The result for the convex order was extended to the so-called s-convex order in [7]. In Sections 3 and 4, we extend these results to the case where X 2 and Y 2 are not necessarily independent from (nor are they required to have identical distribution functions as) X 1 and Y 1 , respectively. For this, we need the following notions (see [23,24]).
Intuitively, if X = (X 1 , X 2 ) is PDS, then its components are more likely simultaneously to have large values, compared with a vector of independent random variables with the same marginal distributions. For relationships between this and other dependence notions see, for example, Table 2 in [25]. The negative dependence analog of Definition 5 is as follows (see [24]).
Intuitively, if X = (X 1 , X 2 ) is NDS, one component of the vector will tend to be large when the other component is small. It is easy to see that a random vector X with continuous marginals is PDS (respectively, NDS) if and only if C(u, v) is componentwise concave (respectively, convex). It is also well-known (see [26]) that a continuous random vector X with copula C has the property PDS (resp. NDS) if and only if its copula C is PDS (resp. NDS) .

A Family of Measures of Variability
Let X 1 and X 2 be two random variables with respective continuous distribution functions F 1 and F 2 and finite expectations. If X 1 and X 2 are two independent copies of X, it is well-known (see [13]) that ν(X) = E(|X 1 − X 2 |) is a measure of variability in the sense of Bickel and Lehmann (that is, it satisfies properties (P0) to (P4)). Let h be a distortion function, that is, a non-decreasing function from [0, 1] to [0, 1] such that h(0) = 0 and h(1) = 1 (given two distribution functions F and G, if G = h • F we say that G is a distortion of F via h). In this section, we show that any functional of the form for all t ∈ [0, 1]) and X 1 and X 2 have a NDS copula, this measure satisfies all the properties (P1 to P7) listed above.
The following theorem extends a result of [5], stated as Theorem 3.B.42 in the book [20], in two directions: first, we consider two random vectors with the same copula instead of two random vectors with independent components; and, second, we allow the first marginal of each vector to be a distortion of the other (via the same h) instead of taking two copies of the same random variable.
Theorem 7. Let X and Y be two random variables with distribution functions F and G, respectively and let h be a distortion function. Let X = (X 1 , X 2 ) be a random vector with respective marginal distribution functions F 1 = h • F and F 2 = F. Similarly, let Y = (Y 1 , Y 2 ) be a random vector with marginal distribution functions G 1 = h • G and G 2 = G, respectively. Suppose that X and Y have the same copula C.
Proof. Since the dispersive order is preserved by distortion functions (Theorem 13 in [27]), we have X 1 ≤ disp Y 1 and X 2 ≤ disp Y 2 . Since X and Y have the same copula, it follows from Definition 2.1 in [28] and Theorem 1 in [29] that there exists a function Φ that maps stochastically (X 1 , It follows from the assumptions that for all x. Therefore, where the first and second equality in (9) follow from the fact that Φ(X 1 , and Φ 2 (·) = Φ 1 (·), respectively. The inequality follows from (8) by using Theorem 1.A.1 in [20].
By taking h(x) = x in Theorem 7, we have the following corollary.
Corollary 8. Let X 2 and Y 2 be two copies of X 1 and Y 1 , respectively, such that X = (X 1 , Remark 9. Given two random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) with the same copula, the condition X i ≤ disp Y i , i = 1, 2 is equivalent to say that the bivariate random vectors X and Y are ordered in a multivariate dispersion sense, see [30].
Now, we can prove the following result.
Theorem 10. Let X be a random variable with strictly increasing distribution function F and let h be a strictly increasing distortion function. Let X 1 and X 2 be two random variables with copula C and marginal distribution functions F 1 and F 2 , respectively. Let ν C (X) = E(|X 1 − X 2 |).
(i) If F 1 = h • F and F 2 = F, then ν C (X) is a comonotonic additive measure of variability in the sense of Bickel and Lehmann, that is, it satisfies properties (P0)-(P4) and (P6). (ii) If F 1 = F 2 = F and the copula C is NDS, then ν C (X) satisfies all the properties (P0) to (P7).
Proof. We first prove (i). Let C be the copula of X 1 and X 2 . From (4), we have Clearly, ν C (X) = 0 if X is degenerated at c ∈ R. This, together with the fact that , for all λ > 0 (see [31]), ensures that ν C (X) satisfies properties (P0) to (P3). Since, given two random variables Z 1 and Z 2 , the condition Z 1 ≤ st Z 2 implies that E(Z 1 ) ≤ E(Z 2 ), property (P4) (consistency of ν C (X) with respect to the dispersive order) is a direct consequence of Theorem 7. Property (P6) follows from the fact that, if Z 1 and Z 2 are comonotonic, then , for all u ∈ (0, 1) (see [32]). Under the assumptions in (ii), we have Standard arguments show that lim u→i F −1 (u)(u − C(u, u)) = 0, i = 0, 1.
Example 11. Two functionals satisfying the assumptions of part (i) are ν C 1 (X) = GMD(X) and |dx, which is the Wasserstein distance between F and its distortion F h = h • F, a variability measure introduced by [34]. Note that ν C 2 (X) = E(|X 1 − X 2 |), where F 1 = h • F, F 2 = F and C 2 is the Fréchet-Hoeffding upper bound copula (see (6)). (7), it follows from Theorem 10 (ii) that ν C (X) = E(|X − m X |), where m X is the median of X, satisfies all the properties (P0) to (P7) listed above. This measure can be written in the form 1 2 E(|X 1 − X 2 |) where F 1 = F 2 = F and where C is the Fréchet-Hoeffding lower bound copula (see (7) and the paragraph below it), which is an example of NDS copula (see [35] for this and other examples of NDS copulas).

Other Stochastic Comparisons
To begin this section, we consider two random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) with the same marginals. Denote by R(F 1 , F 2 ) the space of bidimensional random vectors with marginal distribution functions F 1 and F 2 .

Remark 14.
An alternative proof of Theorem 13 can be given by using Theorem 1 in [37], which provides conditions to ensure, under the above assumptions, that E(k(X 1 , Y 2 )) ≥ E(k(Y 1 , Y 2 )) for certain classes of functions k. The proof is based on proving that the functions k(x, y) = φ|x − y| and k(x, y) = φ(x − y) + , with φ increasing and convex, satisfy those conditions.
A more general type of comparison can be made between two random vectors with possibly different (but stochastically ordered) marginals. The following two results provide conditions to compare two random excesses. The first result is given in terms of the usual stochastic order and the second result in terms of the increasing convex order.
Theorem 15. Let X = (X 1 , X 2 ) be a random vector with respective marginal distribution functions F 1 and F 2 . Similarly, let Y = (Y 1 , Y 2 ) be a random vector with marginal distribution functions G 1 and G 2 , respectively. If X and Y have the same copula C, X 1 ≤ st Y 1 and X 2 ≥ st Y 2 , then Therefore, given x ≥ 0, for all p ∈ (0, 1) and ∂ 2 C increases in the first argument (since ∂ 2 C(u, p) is the distribution function of the random variable (U|V = p)). Therefore,F (X 1 −X 2 ) + (x) ≤Ḡ (Y 1 −Y 2 ) + (x) for all x ≥ 0, which ends the proof. Theorem 16. Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two random vectors with copulas C and C , and marginal distribution functions F 1 , F 2 and G 1 , G 2 , respectively. If C is NDS, C ≺ C, [20]). Since X and Y * have the same copula C, the random vectorsX = (

Proof. Let us consider a vector
. Moreover, since C is NDS (that is, componentwise convex), thenĈ(u, v) is PDS (that is, componentwise concave). It follows from Corollary 2.7 in [38] The result follows by using the fact that the increasing convex order is transitive and is preserved by the increasing convex transformation φ(t) = t + (see Theorem 4.A.8(a) in [20]).

Remark 17.
In particular, Theorem 16 holds when X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) have the same NDS copula C. In this case, Lemma 18. Let X and Y be two random variables that are symmetric about 0. Then: Proof. Let F X + and F |X| be the tail functions of X + and |X|, respectively. If X and Y are symmetric about 0, it is easy to see that F |X| (t) = h F X + (t) for all t, where h is the concave distortion function Now (i) and (ii) follow, respectively, from Theorem 2.6 (i) and Theorem 2.6 (v) in [39].
The following result follows immediately from Theorem 15 and Lemma 18.

Corollary 19.
Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two random vectors with the same copula C and with marginal distribution functions F 1 , F 2 and G 1 , G 2 , respectively. If X 1 ≤ st Y 1 and The following result is also an immediate corollary of Theorem 16 and Lemma 18. Corollary 20. Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two random vectors with symmetric copulas C and C , and marginal distribution functions F 1 , F 2 and G 1 , G 2 , respectively. If C is NDS, C ≺ C,

Remark 21.
In particular, Corollary 20 holds when X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) have the same symmetric NDS copula C. When this is the case, Since the independence copula is both NDS and PDS, a particular case of Corollaries 19 and 20 is the following.

Corollary 22.
Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two random vectors with independent components and with marginal distribution functions F 1 , F 2 and G 1 , G 2 , respectively.
The following corollaries extend Lemma 2.2 in [6] from the case of two random vectors with independent components to the case of two random vectors with the same symmetric NDS copula. Corollary 23. Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two random vectors with the same symmetric NDS copula C and with marginal distribution functions F 1 , F 2 and G 1 , G 2 , respectively. If [20]), it follows −X 2 ≤ icx −Y 2 . This is equivalent to write X 2 ≥ icv Y 2 ; therefore, the result follows from Corollary 20.
Corollary 24. Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two random vectors with the same symmetric NDS copula, such that X 2 = st X 1 and Y 1 = st Y 2 , all variables having finite means. If Proof. It is well-known (see (3.C.7) in [20]) that . The result follows from Corollary 23.

An Application in Actuarial Science
In actuarial theory, a premium principle is a decision rule used by the insurer in order to determine the price for a risk to be insured. More formally, given a random variable X describing an insurance risk, a premium principle T assigns to X a number T(X) which is the premium to be charged for accepting the risk X (see [19] for an overview). The simplest premium principle is the net premium T(X) = E(X), which does not load for risk. More general premium principles are obtained by adding a load to the net premium that reflects the danger associated with the risk. Since the danger is often interpreted in terms of variability, a number of premium principles are obtained by adding to the net premium a risk load proportional to a specific measure of variability. Examples include the following: where p ≥ 0 and h is a distortion function (see [34,40]).
Other examples can be found in [41]. Following this schema, we define a general class of premium principles based on distances between random variables. Definition 25. Given a risk X with distribution function F, let F be the family of premium principles of the form where X 1 and X 2 are two copies of X such that X 1 and X 2 have copula C.
From the results in Section 2, we see that T 2 , T 3 and T 4 (for p = 1) are premium principles that belong to the family F for different choices of the copula C. Moreover, it follows from Theorem 10 that a premium principle T C ∈ F satisfies the following properties: (a) Risk loading: T C (X) ≥ E(X).
A premium principle satisfying the above properties that does not follow the schema T(X) = E(X) + λD(X), where D is a measure of variability of X, is the distortion premium principle [32], defined by where h is a concave distortion function. Our next result is related to a property of T h . Recall that, given a random variable Z with distribution function F Z , the family is called a location-scale family of random variables. It is shown in [42] that the distortion premium principle T h reduces to T 1 (the standard deviation premium principle) for locationscale families of distributions. Next, we give a similar result involving the premium principle T C given in Definition 25.
Theorem 26. Consider a location scale family Π Z and let T C ∈ F. Then, T C reduces to T 1 (standard deviation premium principle) or, equivalently, to T h (ditortion premium principle) on Π Z .
Proof. Since T 1 is a special case of T h on location-scale families of distributions [42], it suffices to prove that T C reduces to T h on Π Z . Let X ∈ Π Z . Since F X (x) = F Z ( x−µ σ ) for all x, if X 1 and X 2 are two copies of X with copula C, we have where Z 1 and Z 2 are two copies of Z with copula C (here, we have used that copulas are invariant for strictly monotone transformations of the random variables). Now, we equate this expression with T h (X) to obtain where we have used that T h (X) = µ + σT h (Z), because T h is scale and translation invariant. Observe that λ is independent of µ and σ; therefore, we conclude that T C reduces to T h on C.

An Application in Portfolio Risk Management
In portfolio risk management, investors diversify portfolios to reduce market risk. On average, a portfolio with several assets exhibits, unless they are perfectly correlated, less variability in returns than a portfolio with only one asset. To illustrate the results in Section 4, let us consider an investor whose portfolio has only one asset A with logreturn X 2 . During the diversification process, the investor is concerned with the risk of two assets B and C with log-returns X 1 and Y 1 , respectively, that might be included in her/his portfolio. Here, the risk of these two assets B and C must be considered in relation to the asset A, which acts as a hedge. One way to assess the impact of each of these two assets is by comparing the distances |X 1 − X 2 | and |Y 1 − X 2 | in some stochastic sense. A smaller distance suggests a higher degree of similarity between the log-returns of the assets when evaluated jointly. If the assets B and C move in the opposite direction as the hedge A, a higher degree of similarity with A intuitively reduces the risk of the diversified portfolio. An alternative method to asses the impact of B and C on the portfolio is by using measures of contagious (see Section 4 in [43]).
Recall that the log return of an asset at week t is defined by r t = log(p t /p t−1 ), where p t is the price of the asset at week t. For our empirical example, we work with log-returns of three stocks included in Nasdaq Composite index: Zoom Video Communications (X 1 ), Moderna (Y 1 ) and Booking Holdings Inc (X 2 = Y 2 ) (we have selected three companies that were affected differently at the beginning of the financial crisis of . The study is based on samples of size n = 64 for each financial institution ({x 1i }, {y 1i } and {x 2i = y 2i }, for i = 1, ..., 64), measuring the share value from 23 December 2019 until 15 March 2021. Data were gathered from the public website http://es.finance.yahoo.com, accessed on 22 March 2021, and are related to the weekly close of trading to eliminate the time dependent effect. Suppose that, initially, the whole portfolio of our investor consists of stocks in only one company: Booking Holdings, Inc. To reduce risk, the investor plans to invest either in Zoom or Moderna and faces the problem of which of them should be chosen. A method that helps to make a decision, as explained above, is to compare the distances between the components of the random vectors X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ). Figure 1 plots the empirical distribution function of sample absolute differences |x 1i − x 2i | between Zoom and Booking (green curve) and |y 1i − y 2i | between Moderna and Booking (blue curve), for i = 1, ..., 64. The blue curve starts above; at some point around x = 0.005, it crosses from above to below the green curve and, after that point, it seems to be everywhere below. The graphic is consistent with a model where |X 1 − X 2 | and |Y 1 − Y 2 | are ordered in the increasing convex order. Next, we give statistical significance to this conclusion.
We first perform some tests to study the marginal distributions of X 1 , X 2 and Y 1 . In order to check randomness, the classical runs test is performed with p-values 0.6143, 0.3134 and 0.2077, respectively. Symmetry is tested using the symmetry test by [44], obtaining p-values 0.388, 0.81, and 0.174, respectively. The Kolmogorov-Smirnov test for normality gives, respectively, the p-values 0.5424, 0.9154, and 0.3486. Therefore, there is not significant evidence to reject the hypothesis that the three log return distributions are random, symmetric, and normal.
A unilateral F-test for paired data, performed for testing the hypothesis of equality of variances against σ X 1 < σ Y 1 , gives a p-value of 0.000493, showing significant evidence that σ X 1 < σ Y 1 . The p-value of the t-test for testing µ X 1 = µ Y 1 against µ X 1 = µ Y 1 is 0.7124, so we can not reject the equality of means. From the assumptions of normality, µ X 1 = µ Y 1 and σ X 1 < σ Y 1 , it follows X 1 ≤ icx Y 1 (see Table 2.2 in [22]). The copulas C and C are adjusted by using the goodness of fit test based on Kendall's process [45,46]. Considering a bivariate normal (BN) copula, we obtain p-values 0.74 and 0.79, respectively; therefore, there is not statistical evidence to reject that C and C are BN. Since the bivariate normal copula parameter is the Pearson correlation coefficient ρ X , we perform the Williams's Test (bilateral) [47,48] for testing the hypothesis ρ X = ρ Y against ρ X = ρ Y when the vectors share one component (X 2 = Y 2 ). The p-value, 0.4622, indicates that we cannot reject the equality, which leads us to admit that C = C ∼ BN(ρ). Since the sample estimate of the Pearson's correlation coefficient is negative, we test ρ = 0 against ρ < 0 by running the test for association between paired samples using Pearson's correlation coefficient for the vector X. The p-value, 0.0442, suggests that ρ < 0, which means that the copulas are NDS (see Example 4.1 in [24]).
To conclude: the assumptions (1) X 1 ≤ icx Y 1 and that (2) the copulas C and C are equal and are NDS, are supported by statistical significance. It follows from Corollary 20 that |X 1 − X 2 | ≤ icx |Y 1 − Y 2 |, which indicates that Zoom leads to a less risky portfolio than Moderna.

Conclusions
Given two random variables X 1 and X 2 that are not necessarily independent, we have provided several results concerning the distances |X 1 − X 2 |, (X 1 − X 2 ) + and their expectations. The most remarkable results of this study can be summarized as follows: (a) If X is a random variable with strictly increasing distribution function F and X 1 and X 2 are two random variables with a NDS (negative dependent through stochastic ordering) copula C and with marginal distribution functions F 1 = F 2 = F, then ν(X) = E(|X 1 − X 2 |) is a variability measure satisfying the following properties: law invariance, translation invariance, positive homogeneity, non-negativity, consistency with the dispersive order, consistency with the excess wealth order, comonotonic additivity, and subadditivity. An example is the median absolute deviation ν(X) = E(|X − m X |), where m X is the median of X, which can be written in the form 1 2 E(|X 1 − X 2 |), where C is the Fréchet-Hoeffding lower bound copula. (b) Given two random vectors (X 1 , X 2 ) and (Y 1 , Y 2 ) with possibly different marginals and copulas, we have given conditions, in terms of several stochastic orders, under which |X 1 − X 2 | ≤ st,icx |Y 1 − Y 2 | and (X 1 − X 2 ) + ≤ st,icx (Y 1 − Y 2 ) + .
Two applications have been provided. In actuarial science, given a risk X, we have proposed a general class of premium principles of the form T C (X) = E(X) + λE(|X 1 − X 2 |), for some λ > 0, where X 1 and X 2 are two copies of X with copula C. In portfolio risk management, we have assessed the inclusion of a new asset in a portfolio by comparing absolute values of differences. It is a question of future research to determine the circum-stances under which the criterion used in Section 5.2 to include a new asset in a portfolio gives rise to a portfolio with smaller realized variance.
Finally, it is interesting to note that the random excess (X 1 − X 2 ) + also has an appealing role in the context of risk management and quantitative finance. If X 1 (t) and X 2 (t) are the prices of two risky assets at time t, the payoff of the option that gives the buyer the right to exchange the second asset for the first at the expiry time t (called exchange option) is (X 1 (t) − X 2 (t)) + . In this context, the results in this paper can be used to compare different payoffs in a similar manner as in Section 5.2.