2.1. Basics of Elliptical Distributions
As the term skew-elliptical distributions suggests, these are obtained as extensions of elliptically contoured (EC) distributions, briefly called ‘elliptical distributions’. To start with, establish then notation and recall some basic facts.
We shall only deal with absolutely continuous EC distributions, which represent the only case relevant for the developments to follow. Consider a function
from
to
such that, for a positive integer
d,
Then, a
d-dimensional continuous random variable
X is said to have an elliptical distribution, with density generator
, if its density is of the form
where
is a location parameter,
is a symmetric positive-definite
scale matrix and
. In this case, we write
.
Since
is the equation of an ellipsoid centered at point
, the level curves of density (
2) constitutes ellipsoids, which explains the name ‘elliptically contoured distributions’.
The EC class enjoys a remarkable number number of attractive formal properties, including closure under marginalization, affine transformations, and conditioning on the values taken on by a subset of the
d components. For a precise statement of these facts, as well as many others, we refer to standard accounts, such as the book by Fang et al. [
8].
The most prominent representative of the EC class is the multivariate normal distribution, which occurs when
. The normal family plays an important role within the EC class, as it generates the subclass of
scale mixtures of normal variates, defined as follows. If
and
S is an independent positive variable which plays the role of ‘random scale factor’, it is easy to check that the scale mixture
has distribution of EC type. In the important instance, where
we obtain the multivariate Student’s
t distribution on
degrees of freedom; in this case, we write
in an obvious notation.
For future reference, it is appropriate to recall a relatively lesser known fact about the EC class and the scale mixtures of normals. While the p-dimensional marginal of a d-dimensional EC variable is still of EC type, as already recalled, this does not ensure that its marginal density functions belong to the same parametric family of the original density.
Kano [
9] has studied necessary and sufficient conditions to ensure that a marginalization operation keeps the distribution within the same parametric class, which he calls ‘consistency property’. In operational terms, the required condition is that the EC distribution allows a stochastic representation of type (
3) where the distribution of the mixing variable
S does not depend on
d. As a key example, this condition holds for
S in (
4); hence, the marginals of a Student’s
t distribution are still of the same type. There are instead several EC families where, although a representation of type (
3) exists, the distribution of
S depends on
d; hence, marginalization consistency does not hold. The motivating instance of this sort in Kano [
9] is the class of multivariate exponential power distributions, which fails to be consistent under marginalization.
2.2. The Multivariate Skew-Normal Distribution
Like the multivariate normal distribution is at the core of the EC class, so is the multivariate skew-normal (SN) with respect to the SEC class and many other related formulations. We present the SN distribution following essentially the constructive route of Azzalini and Dalla Valle [
10], also in view of its relevance for the subsequent discussion.
Start from independent variables
and
, where
is a positive-defined
correlation matrix. Next, for any vector
, define
where
for
. Equivalently, write in a more compact form
where
. For future use, define also the term-by-term transform
which is invertible via
. Some algebraic work yields the density function of
Z, which is
where
denotes the
density,
the
distribution function, and
Clearly, the distribution of
Z is regulated by the pair
, or equivalently by
. If
, then
Z coincides with the normal variable
; otherwise, the non-linear transformation of
in (
6) produces a non-normal distribution of
Z. Correspondingly, in the latter case, the vector
in (
10) is non-zero, so that the density (
8) is the product of the EC factor
times a perturbation factor
. The result is a density function with skew-elliptical contour level curves. The contour level plots in
Figure 1 display two examples of SN densities (
8) with
.
If was later shown by Azzalini and Capitanio [
11] that the pair
uniquely identifies
, for any choice of the correlation matrix
and any
d-vector
. So
can equally be adopted as a legitimate parameterization of the family. Since in applied work we need to regulate location and scale, introduce the additional transformation
where
and
has positive diagonal elements. We correspondingly write
, where
.
In a broad sense, the role of the various regulating parameters is as follows: controls location, scale, the dependence structure, departure from normality. The phrase ‘in a broad sense’ is there to caution on the fact that, when we come to commonly used summary measures, these are actually influenced by all terms. For instance, the mean value of Y is not just , but a function of all components parameters. This is the reason why the symbol has been adopted, instead of the more popular . However, it is true that, for a given choice of , an addition , say, to the vector produces precisely the same addition on , which justifies use of the name ‘location parameter’ for .
Besides the stochastic representation (
6) of additive form, the SN distribution allows other representations. An especially important one is the following, based on a conditioning mechanism. Consider the multivariate normal variable
where
is a positive-definite correlation matrix. Then, both variables
have density function (
8). Strictly speaking, the equality sign following the random variable
should be the one of equality in distribution, but the simpler notation matches the one for
Z. In addition, this notation expresses how the result in going to be employed in practice, for simulation purposes, to sample pseudo-random numbers from the target distribution.
The additive stochastic representation (
6) and the two variants of the conditioning (
13) establish direct connections between the SN family and various subject-matter motivated constructions, as elucidated later on in more detail. A further type of stochastic representation is presented at pp. 129–130 of Azzalini and Capitanio [
7].
A direct implication of the second variant of representation (
13) is the property of
perturbation invariance: any even transformation of
Z has the same distribution of the same transformation applied to
X. Some noteworthy implications are
The perturbation invariance property of the SN family is a special instance of a more general statement presented in the next subsection.
The SN distribution enjoys a high level of mathematical tractability, which approaches the one of the multivariate normal distribution. Among its many formal properties, a special mention is due for closure of the family with respect to affine transformations. Specifically, if
, then
for a
q-vector
a and a full-rank
matrix
A; here
is a
q-vector, in which the explicit expression is given, for instance, on p. 133 of Azzalini and Capitanio [
7]. A direct implication is that any
q-dimensional marginal distribution of
Y still has SN distribution. The moment generating function of
Y takes the simple form
Then, from the cumulant generating function
, we obtain
where
.
One of the few limitations of the multivariate SN family is the lack of closure under conditioning, that is, if
has a SN distribution, then the distribution of
conditionally on the value taken on by
is not of SN type. This property can be achieved by considering a simple extension of the family, denoted ‘extended skew-normal’, which includes an extra parameter,
. The density function corresponding to (
8) is now
where
When
Y is defined as at (
11), the corresponding density function at
is
and we write
. Clearly, if
, we return to the original SN distribution. The corresponding moment generating function is
which shows that the moments of the distribution of
Y are non-linear functions of
.
The extended skew-normal distribution arises under the constructive route describe above, if the term
in (
5) is replaced by one having distribution
truncated below 0. Similarly, the distribution arises from a conditioning mechanism similar to the first expression of (
13) if the condition
is replaced by
.
The introduction of the additional parameter
widens the ranges of the measures of asymmetry and kurtosis with respects to the original SN family. However, the increase of flexibility in this sense is not substantial, as visible from Figure 2.5 of Azzalini and Capitanio [
7] for the univariate case. The main advantages of the extended SN family are arguably two others: (a) the more general generating mechanisms indicated in the previous paragraph, which make the distribution a more plausible ‘physically motivated’ model in a number of applications; (b) as anticipated, the extended family is closed under a conditioning operation, a property which turns out to be useful in some applications; the explicit expression of the corresponding parameters after conditioning are available, for instance, on p. 151 of Azzalini and Capitanio [
7].
The price to pay for these attractive formal properties is the lack of an analogue of the second stochastic representation in (
13). In turn, this removes the property of perturbation invariance, so that (
14) and similar facts do not hold any longer.
Many other formal results can be obtained on the multivariate SN distribution, many of them with relatively little effort. A self-contained account of the main results, built collecting a number of contributions from the literature, is presented in Chapter 5 of Azzalini and Capitanio [
7]. See, specifically, the results of Genton et al. [
12], Capitanio [
13], Balakrishnan and Scarpa [
14], and Balakrishnan et al. [
15], among others.
2.3. A General Result
The overall target of the rest of our presentation is to introduce a number of variants and extensions, of various degrees of generality, of the SN distribution. Many of these other constructions replace the normal density appearing in (
8) by some alternative, but other directions are also considered. A substantial fraction of these extensions can be embraced in the following result presented by Azzalini and Capitanio [
11] (p. 599). Here, and in the following, the phrase ‘
X is symmetric about 0’ referred to a random variable
X is a shorthand for ‘
X is symmetrically distributed about 0.
Proposition 1. Denote by T a continuous real-valued random variable with distribution function, symmetric about 0, and bya d-dimensional variable with density function, independent of T, such that the real-valued variableis symmetric about 0. Then,is a density function. Proof. Note that the random variable
is symmetric about 0, and expand
as the expected value of conditional probabilities, leading to
□
An interesting fact is that the re-normalizing constant in (
18) is universally 2. In addition, in the light of the prominent role played in the subsequent discussion by the condition of symmetry for
, it is worth emphasizing that
here can be
any density function.
Implicit in the proof of Proposition 1, there is an acceptance-rejection argument which, combined with symmetry of
, leads to the following stochastic representations. If
and
T are random variables as in Proposition 1, then both variables
have density function (
18). A corollary of the second representation in (
19) is the property of
perturbation invariance stated next.
Proposition 2. Under the assumptions of Proposition 1, denote bya random variable with density functionand by Z a random variable with density f. Them, for any functiontaking values in(), such thatfor all, the transformed variablesandhave the same distribution.
In the univariate case, densities (
18) can be viewed as weighted forms of
, in the sense examined by Rao [
16]. This particular sub-class of weighted distributions enjoys special features, namely the stochastic representations (
19) and the implied property of perturbation invariance.
2.4. Symmetry-Modulated Distributions
After the publication of the Azzalini and Dalla Valle [
10] paper, subsequent contributions in the literature have generated a number of extensions at progressive level of generality. While we shall focus mostly on the stream labeled ‘SEC distributions’, it is appropriate to delineate at least the main traits of the broader picture, for a more comprehension view. We set our exposition at a level which encompasses the majority of proposed formulations, especially those more directly connected to applied work, while retaining a relatively simple mathematical level.
Even more general formulations, moving along two different paths, are those of Arellano-Valle et al. [
17] and Jupp et al. [
18]. However, these treatments encompass also many situations which for technical reasons cannot be pursued operationally. The discussion below covers the vast majority of the practically workable constructions, as known at the time of writing.
Proposition 3 below, obtained by Azzalini and Capitanio [
19], represents an easy-to-use restricted version of Proposition 1, since under conditions (
20) it is immediate that the condition of symmetry of
required in Proposition 1 is fulfilled. The result provides a simple method to build variant forms of a
baseline density,
say, which is required to be centrally symmetric density; that is, such that
. Multiplication of
by a modulation factor produces
perturbed or
modulated versions of
.
Proposition 3. Denote bya probability density function on, bya continuous distribution function on the real line, and bya real-valued function on, such thatfor all,. Then, (18) is a density function on. In independent work, Wang et al. [
20] have obtained an essentially equivalent formulation, which can be expressed as follows. On setting
, we can rewrite conditions (
20) as
for all
, and write the density as
Not only if
w and
satisfy (
20), then
satisfies (
21), but also the essentially converse statement can be shown to hold: for any function
satisfying (
21), there exist functions
satisfying (
20) leading to the same
G; hence, the same
f, is in fact a multitude of them.
The conclusion is that, for any centrally symmetric density
, Proposition 3 and the variant form based on (
21) generate the same set of distributions. In this sense, they are equivalent formulations. Which of the two forms to adopt is a matter of convenience and taste. Version (
21) is more suitable for mathematical work, but the actual construction of functions
G with the required properties is more naturally approached using form (
20).
Clearly, the SN density (
8) is of type (
18) with
,
,
. The vast majority of the distributions to be discussed later are of type (
18) or some ‘extended’ version of them. The term
skew-symmetric distributions is often used to identify these constructions, but we prefer the term
symmetry-modulated for the following reason: skewness is typically the form of departure from symmetry associated with linear or mildly non-linear choices of
, as it was the case in earlier constructions of this logic, but more elaborate functions
lead to distributions where skewness is not the more prominent feature.
Figure 2 illustrates this point by using a strongly non-linear functions
to modulate, via
, a standard bivariate normal density function with independent components.
The extension of the stochastic representations in (
13), after orthogonalization to independent variables, is as follows. Under the conditions of Proposition 3, denote by
a random variable with density
and by
T an independent variable with distribution function
. Then, both variables
have density (
18). The second form in (
22) is more convenient for random number generation, since no rejection of a generated outcome
can occur; the first form allows us to draw connections with a number formulations in applied domains where a selection mechanism of sample values occurs, regulated by a condition of type
.
Similarly to the SN distribution and Proposition 2, the second variant of representation (
22) leads to the perturbation invariance property: under the above conditions, if
and
Z have density function
and
f, respectively, and
t is a even function from
to
, then
and
have the same distribution. For instance, if
is a random variable where density is depicted in each of the panels of
Figure 2, then
, since this fact holds for the associated variable
.