#### 2.1. Basics of Elliptical Distributions

As the term skew-elliptical distributions suggests, these are obtained as extensions of elliptically contoured (EC) distributions, briefly called ‘elliptical distributions’. To start with, establish then notation and recall some basic facts.

We shall only deal with absolutely continuous EC distributions, which represent the only case relevant for the developments to follow. Consider a function

$\tilde{p}$ from

${\mathbb{R}}^{+}$ to

${\mathbb{R}}^{+}$ such that, for a positive integer

d,

Then, a

d-dimensional continuous random variable

X is said to have an elliptical distribution, with density generator

$\tilde{p}$, if its density is of the form

where

$\mu \in {\mathbb{R}}^{d}$ is a location parameter,

$\mathrm{\Sigma}$ is a symmetric positive-definite

$d\times d$ scale matrix and

${c}_{d}=\mathrm{\Gamma}(d/2)/\left(2\phantom{\rule{0.166667em}{0ex}}{\pi}^{d/2}\phantom{\rule{0.222222em}{0ex}}{k}_{d}\right)$. In this case, we write

$X\sim {\mathrm{EC}}_{d}(\mu ,\mathrm{\Sigma},\tilde{p})$.

Since

${(x-\mu )}^{\top}\mathrm{\Sigma}{\phantom{\rule{0.277778em}{0ex}}}^{-1}(x-\mu )=\mathrm{constant}$ is the equation of an ellipsoid centered at point

$\mu $, the level curves of density (

2) constitutes ellipsoids, which explains the name ‘elliptically contoured distributions’.

The EC class enjoys a remarkable number number of attractive formal properties, including closure under marginalization, affine transformations, and conditioning on the values taken on by a subset of the

d components. For a precise statement of these facts, as well as many others, we refer to standard accounts, such as the book by Fang et al. [

8].

The most prominent representative of the EC class is the multivariate normal distribution, which occurs when

$\tilde{p}\left(u\right)=\mathrm{exp}(-u/2)$. The normal family plays an important role within the EC class, as it generates the subclass of

scale mixtures of normal variates, defined as follows. If

$X\sim \mathrm{N}{}_{d}(0,\mathrm{\Sigma})$ and

S is an independent positive variable which plays the role of ‘random scale factor’, it is easy to check that the scale mixture

has distribution of EC type. In the important instance, where

we obtain the multivariate Student’s

t distribution on

$\nu $ degrees of freedom; in this case, we write

$Y\sim {t}_{d}(\mu ,\mathrm{\Sigma},\nu )$ in an obvious notation.

For future reference, it is appropriate to recall a relatively lesser known fact about the EC class and the scale mixtures of normals. While the p-dimensional marginal of a d-dimensional EC variable is still of EC type, as already recalled, this does not ensure that its marginal density functions belong to the same parametric family of the original density.

Kano [

9] has studied necessary and sufficient conditions to ensure that a marginalization operation keeps the distribution within the same parametric class, which he calls ‘consistency property’. In operational terms, the required condition is that the EC distribution allows a stochastic representation of type (

3) where the distribution of the mixing variable

S does not depend on

d. As a key example, this condition holds for

S in (

4); hence, the marginals of a Student’s

t distribution are still of the same type. There are instead several EC families where, although a representation of type (

3) exists, the distribution of

S depends on

d; hence, marginalization consistency does not hold. The motivating instance of this sort in Kano [

9] is the class of multivariate exponential power distributions, which fails to be consistent under marginalization.

#### 2.2. The Multivariate Skew-Normal Distribution

Like the multivariate normal distribution is at the core of the EC class, so is the multivariate skew-normal (SN) with respect to the SEC class and many other related formulations. We present the SN distribution following essentially the constructive route of Azzalini and Dalla Valle [

10], also in view of its relevance for the subsequent discussion.

Start from independent variables

${U}_{0}={({U}_{01},\dots ,{U}_{0d})}^{\top}\sim \mathrm{N}{}_{d}(0,\overline{\mathrm{\Psi}})$ and

${U}_{1}\sim \mathrm{N}(0,1)$, where

$\overline{\mathrm{\Psi}}$ is a positive-defined

$d\times d$ correlation matrix. Next, for any vector

$\delta ={({\delta}_{1},\dots ,{\delta}_{d})}^{\top}\in {(-1,1)}^{d}$, define

$Z={({Z}_{1},\dots ,{Z}_{d})}^{\top}$ where

for

$j=1,\dots ,d$. Equivalently, write in a more compact form

where

${D}_{\delta}={\left({I}_{d}-\mathrm{diag}{\left(\delta \right)}^{2}\right)}^{1/2}$. For future use, define also the term-by-term transform

which is invertible via

${\delta}_{j}={\left(1+{\lambda}_{j}^{2}\right)}^{-1/2}{\lambda}_{j}$. Some algebraic work yields the density function of

Z, which is

where

${\phi}_{d}(\xb7;\overline{\mathrm{\Omega}})$ denotes the

$\mathrm{N}{}_{d}(0,\overline{\mathrm{\Omega}})$ density,

$\mathrm{\Phi}(\xb7)$ the

$\mathrm{N}(0,1)$ distribution function, and

Clearly, the distribution of

Z is regulated by the pair

$(\overline{\mathrm{\Psi}},\delta )$, or equivalently by

$(\overline{\mathrm{\Psi}},\lambda )$. If

$\delta =0=\lambda $, then

Z coincides with the normal variable

${U}_{0}$; otherwise, the non-linear transformation of

${U}_{1}$ in (

6) produces a non-normal distribution of

Z. Correspondingly, in the latter case, the vector

$\alpha $ in (

10) is non-zero, so that the density (

8) is the product of the EC factor

${\phi}_{d}$ times a perturbation factor

$\mathrm{\Phi}\left({\alpha}^{\top}x\right)$. The result is a density function with skew-elliptical contour level curves. The contour level plots in

Figure 1 display two examples of SN densities (

8) with

$d=2$.

If was later shown by Azzalini and Capitanio [

11] that the pair

$(\overline{\mathrm{\Omega}},\alpha )$ uniquely identifies

$(\overline{\mathrm{\Psi}},\delta )$, for any choice of the correlation matrix

$\overline{\mathrm{\Omega}}$ and any

d-vector

$\alpha $. So

$(\overline{\mathrm{\Omega}},\alpha )$ can equally be adopted as a legitimate parameterization of the family. Since in applied work we need to regulate location and scale, introduce the additional transformation

where

$\xi \in {\mathbb{R}}^{d}$ and

$\omega =\mathrm{diag}({\omega}_{1},\dots ,{\omega}_{d})$ has positive diagonal elements. We correspondingly write

$Y\sim \mathrm{SN}{}_{d}(\xi ,\mathrm{\Omega},\alpha )$, where

$\mathrm{\Omega}=\omega \phantom{\rule{0.166667em}{0ex}}\overline{\mathrm{\Omega}}\phantom{\rule{0.166667em}{0ex}}\omega $.

In a broad sense, the role of the various regulating parameters is as follows: $\xi $ controls location, $\omega $ scale, $\overline{\mathrm{\Omega}}$ the dependence structure, $\alpha $ departure from normality. The phrase ‘in a broad sense’ is there to caution on the fact that, when we come to commonly used summary measures, these are actually influenced by all terms. For instance, the mean value of Y is not just $\xi $, but a function of all components parameters. This is the reason why the symbol $\xi $ has been adopted, instead of the more popular $\mu $. However, it is true that, for a given choice of $(\mathrm{\Omega},\alpha )$, an addition $+u$, say, to the $\xi $ vector produces precisely the same addition $+u$ on ${\mathbb{E}}_{}\phantom{\rule{-0.166667em}{0ex}}\left\{{\displaystyle Y}\right\}$, which justifies use of the name ‘location parameter’ for $\xi $.

Besides the stochastic representation (

6) of additive form, the SN distribution allows other representations. An especially important one is the following, based on a conditioning mechanism. Consider the multivariate normal variable

where

${\mathrm{\Omega}}^{*}$ is a positive-definite correlation matrix. Then, both variables

have density function (

8). Strictly speaking, the equality sign following the random variable

${Z}^{\prime}$ should be the one of equality in distribution, but the simpler notation matches the one for

Z. In addition, this notation expresses how the result in going to be employed in practice, for simulation purposes, to sample pseudo-random numbers from the target distribution.

The additive stochastic representation (

6) and the two variants of the conditioning (

13) establish direct connections between the SN family and various subject-matter motivated constructions, as elucidated later on in more detail. A further type of stochastic representation is presented at pp. 129–130 of Azzalini and Capitanio [

7].

A direct implication of the second variant of representation (

13) is the property of

perturbation invariance: any even transformation of

Z has the same distribution of the same transformation applied to

X. Some noteworthy implications are

The perturbation invariance property of the SN family is a special instance of a more general statement presented in the next subsection.

The SN distribution enjoys a high level of mathematical tractability, which approaches the one of the multivariate normal distribution. Among its many formal properties, a special mention is due for closure of the family with respect to affine transformations. Specifically, if

$Y\sim \mathrm{SN}{}_{d}(\xi ,\mathrm{\Omega},\alpha )$, then

for a

q-vector

a and a full-rank

$d\times q$ matrix

A; here

$\tilde{\alpha}$ is a

q-vector, in which the explicit expression is given, for instance, on p. 133 of Azzalini and Capitanio [

7]. A direct implication is that any

q-dimensional marginal distribution of

Y still has SN distribution. The moment generating function of

Y takes the simple form

Then, from the cumulant generating function

$K\left(t\right)=\mathrm{log}M\left(t\right)$, we obtain

where

${\mu}_{z}={\mathbb{E}}_{}\phantom{\rule{-0.166667em}{0ex}}\left\{{\displaystyle Z}\right\}={(2/\pi )}^{1/2}\delta $.

One of the few limitations of the multivariate SN family is the lack of closure under conditioning, that is, if

$Y=({Y}_{1},{Y}_{2})$ has a SN distribution, then the distribution of

${Y}_{1}$ conditionally on the value taken on by

${Y}_{2}$ is not of SN type. This property can be achieved by considering a simple extension of the family, denoted ‘extended skew-normal’, which includes an extra parameter,

$\tau \in \mathbb{R}$. The density function corresponding to (

8) is now

where

When

Y is defined as at (

11), the corresponding density function at

$x\in {\mathbb{R}}^{d}$ is

and we write

$Y\sim \mathrm{SN}{}_{d}(\xi ,\mathrm{\Omega},\alpha ,\tau )$. Clearly, if

$\tau =0$, we return to the original SN distribution. The corresponding moment generating function is

which shows that the moments of the distribution of

Y are non-linear functions of

$\tau $.

The extended skew-normal distribution arises under the constructive route describe above, if the term

$|{U}_{1}|$ in (

5) is replaced by one having distribution

$\mathrm{N}(-\tau ,1)$ truncated below 0. Similarly, the distribution arises from a conditioning mechanism similar to the first expression of (

13) if the condition

${X}_{1}>0$ is replaced by

${X}_{1}+\tau >0$.

The introduction of the additional parameter

$\tau $ widens the ranges of the measures of asymmetry and kurtosis with respects to the original SN family. However, the increase of flexibility in this sense is not substantial, as visible from Figure 2.5 of Azzalini and Capitanio [

7] for the univariate case. The main advantages of the extended SN family are arguably two others: (a) the more general generating mechanisms indicated in the previous paragraph, which make the distribution a more plausible ‘physically motivated’ model in a number of applications; (b) as anticipated, the extended family is closed under a conditioning operation, a property which turns out to be useful in some applications; the explicit expression of the corresponding parameters after conditioning are available, for instance, on p. 151 of Azzalini and Capitanio [

7].

The price to pay for these attractive formal properties is the lack of an analogue of the second stochastic representation in (

13). In turn, this removes the property of perturbation invariance, so that (

14) and similar facts do not hold any longer.

Many other formal results can be obtained on the multivariate SN distribution, many of them with relatively little effort. A self-contained account of the main results, built collecting a number of contributions from the literature, is presented in Chapter 5 of Azzalini and Capitanio [

7]. See, specifically, the results of Genton et al. [

12], Capitanio [

13], Balakrishnan and Scarpa [

14], and Balakrishnan et al. [

15], among others.

#### 2.3. A General Result

The overall target of the rest of our presentation is to introduce a number of variants and extensions, of various degrees of generality, of the SN distribution. Many of these other constructions replace the normal density appearing in (

8) by some alternative, but other directions are also considered. A substantial fraction of these extensions can be embraced in the following result presented by Azzalini and Capitanio [

11] (p. 599). Here, and in the following, the phrase ‘

X is symmetric about 0’ referred to a random variable

X is a shorthand for ‘

X is symmetrically distributed about 0.

**Proposition** **1.** Denote by T a continuous real-valued random variable with distribution function${G}_{0}$, symmetric about 0, and by${Z}_{0}$a d-dimensional variable with density function${f}_{0}$, independent of T, such that the real-valued variable$W=w\left({Z}_{0}\right)$is symmetric about 0. Then,is a density function. **Proof.** Note that the random variable

$T-W$ is symmetric about 0, and expand

${\mathbb{P}}_{}\phantom{\rule{-0.166667em}{0ex}}\left\{{\displaystyle T-W\le 0}\right\}$ as the expected value of conditional probabilities, leading to

□

An interesting fact is that the re-normalizing constant in (

18) is universally 2. In addition, in the light of the prominent role played in the subsequent discussion by the condition of symmetry for

${f}_{0}$, it is worth emphasizing that

${f}_{0}$ here can be

any density function.

Implicit in the proof of Proposition 1, there is an acceptance-rejection argument which, combined with symmetry of

$w\left({Z}_{0}\right)$, leads to the following stochastic representations. If

${Z}_{0}$ and

T are random variables as in Proposition 1, then both variables

have density function (

18). A corollary of the second representation in (

19) is the property of

perturbation invariance stated next.

**Proposition** **2.** Under the assumptions of Proposition 1, denote by${Z}_{0}$a random variable with density function${f}_{0}$and by Z a random variable with density f. Them, for any function$t(\xb7)$taking values in${\mathbb{R}}^{q}$($q\ge 1$), such that$t\left(x\right)=t(-x)$for all$x\in {\mathbb{R}}^{d}$, the transformed variables$t\left(Z\right)$and$t\left({Z}_{0}\right)$have the same distribution.

In the univariate case, densities (

18) can be viewed as weighted forms of

${f}_{0}$, in the sense examined by Rao [

16]. This particular sub-class of weighted distributions enjoys special features, namely the stochastic representations (

19) and the implied property of perturbation invariance.

#### 2.4. Symmetry-Modulated Distributions

After the publication of the Azzalini and Dalla Valle [

10] paper, subsequent contributions in the literature have generated a number of extensions at progressive level of generality. While we shall focus mostly on the stream labeled ‘SEC distributions’, it is appropriate to delineate at least the main traits of the broader picture, for a more comprehension view. We set our exposition at a level which encompasses the majority of proposed formulations, especially those more directly connected to applied work, while retaining a relatively simple mathematical level.

Even more general formulations, moving along two different paths, are those of Arellano-Valle et al. [

17] and Jupp et al. [

18]. However, these treatments encompass also many situations which for technical reasons cannot be pursued operationally. The discussion below covers the vast majority of the practically workable constructions, as known at the time of writing.

Proposition 3 below, obtained by Azzalini and Capitanio [

19], represents an easy-to-use restricted version of Proposition 1, since under conditions (

20) it is immediate that the condition of symmetry of

$w\left({Z}_{0}\right)$ required in Proposition 1 is fulfilled. The result provides a simple method to build variant forms of a

baseline density,

${f}_{0}$ say, which is required to be centrally symmetric density; that is, such that

${f}_{0}(-x)={f}_{0}\left(x\right)$. Multiplication of

${f}_{0}$ by a modulation factor produces

perturbed or

modulated versions of

${f}_{0}$.

**Proposition** **3.** Denote by${f}_{0}$a probability density function on${\mathbb{R}}^{d}$, by${G}_{0}(\xb7)$a continuous distribution function on the real line, and by$w(\xb7)$a real-valued function on${\mathbb{R}}^{d}$, such thatfor all$x\in {\mathbb{R}}^{d}$,$y\in \mathbb{R}$. Then, (18) is a density function on${\mathbb{R}}^{d}$. In independent work, Wang et al. [

20] have obtained an essentially equivalent formulation, which can be expressed as follows. On setting

$G\left(x\right)={G}_{0}\left\{w\left(x\right)\right\}$, we can rewrite conditions (

20) as

for all

$x\in {\mathbb{R}}^{d}$, and write the density as

Not only if

w and

${G}_{0}$ satisfy (

20), then

$G\left(x\right)={G}_{0}\left\{w\left(x\right)\right\}$ satisfies (

21), but also the essentially converse statement can be shown to hold: for any function

$G\left(x\right)$ satisfying (

21), there exist functions

$w,{G}_{0}$ satisfying (

20) leading to the same

G; hence, the same

f, is in fact a multitude of them.

The conclusion is that, for any centrally symmetric density

${f}_{0}\left(x\right)$, Proposition 3 and the variant form based on (

21) generate the same set of distributions. In this sense, they are equivalent formulations. Which of the two forms to adopt is a matter of convenience and taste. Version (

21) is more suitable for mathematical work, but the actual construction of functions

G with the required properties is more naturally approached using form (

20).

Clearly, the SN density (

8) is of type (

18) with

${f}_{0}={\phi}_{d}$,

$w\left(x\right)={\alpha}^{\top}x$,

${G}_{0}=\mathrm{\Phi}$. The vast majority of the distributions to be discussed later are of type (

18) or some ‘extended’ version of them. The term

skew-symmetric distributions is often used to identify these constructions, but we prefer the term

symmetry-modulated for the following reason: skewness is typically the form of departure from symmetry associated with linear or mildly non-linear choices of

$w\left(x\right)$, as it was the case in earlier constructions of this logic, but more elaborate functions

$w\left(x\right)$ lead to distributions where skewness is not the more prominent feature.

Figure 2 illustrates this point by using a strongly non-linear functions

$w\left(x\right)$ to modulate, via

$\mathrm{\Phi}\left(w\right(x\left)\right)$, a standard bivariate normal density function with independent components.

The extension of the stochastic representations in (

13), after orthogonalization to independent variables, is as follows. Under the conditions of Proposition 3, denote by

${X}_{0}$ a random variable with density

${f}_{0}$ and by

T an independent variable with distribution function

${G}_{0}$. Then, both variables

have density (

18). The second form in (

22) is more convenient for random number generation, since no rejection of a generated outcome

${X}_{0}$ can occur; the first form allows us to draw connections with a number formulations in applied domains where a selection mechanism of sample values occurs, regulated by a condition of type

$T\le {G}_{0}\left(X\right)$.

Similarly to the SN distribution and Proposition 2, the second variant of representation (

22) leads to the perturbation invariance property: under the above conditions, if

${X}_{0}$ and

Z have density function

${f}_{0}$ and

f, respectively, and

t is a even function from

${\mathbb{R}}^{d}$ to

${\mathbb{R}}^{q}$, then

$t\left({X}_{0}\right)$ and

$t\left(Z\right)$ have the same distribution. For instance, if

$Z=({Z}_{1},{Z}_{2})$ is a random variable where density is depicted in each of the panels of

Figure 2, then

${Z}_{1}^{2}+{Z}_{2}^{2}\sim {\chi}_{2}^{2}$, since this fact holds for the associated variable

${X}_{0}$.