A Selective Overview of Skew-Elliptical and Related Distributions and of Their Applications

: Within the context of ﬂexible parametric families of distributions, much work has been dedicated in recent years to the theme of skew-symmetric distributions, or symmetry-modulated distributions, as we prefer to call them. The present contribution constitutes a review of this area, with special emphasis on multivariate skew-elliptical families, which represent the subset with more immediate impact on applications. After providing background information of the distribution theory aspects, we focus on the aspects more relevant for applied work. The exposition is targeted to non-specialists in this domain, although some general knowledge of probability and multivariate statistics is assumed. Given this aim, the mathematical proﬁle is kept to the minimum required.


The Wider Perspective
The development of parametric families of distributions and the study of their properties is a fundamental component of statistical methodology. Given the very long tradition of this area, a vast arsenal of distributions have been accumulated over time, suitable for tackling a great variety of applied problems. Consequently, one may legitimately wonder whether additional effort in this direction is really motivated by scientific requirements or it is just the outcome of nice mathematical exercises providing secondary refinements of an already well-established area. Put in another way, one may question the motivation for the particularly intense activity developed on these themes in the last two decades or so, sometimes associated with the phrase 'flexible parametric [families of] distributions'.
In our view, there are at least two, connected but conceptually separate, reasons which motivate the perpetual efforts in this domain. One such reason is the increase in size of the commonly available data records, which is continuously growing as time goes by, and typically a larger sample calls for a more flexible parametric family to fit the data. The basic factor is that a sample of limited size can easily appear compatible with different assumptions on the underlying distribution, while with larger samples departures from the expected pattern cannot so easily be dismissed as the effect of random variability. This implies that either a more specific distributional assumption is required, possibly motivated by subject matter consideration in real applications, or a more flexible parametric family must be employed.
The second fact, intersecting with the previous one, is that in more recent times the dimensionality of available data has also increased, not only the number of records. This is not to say that consideration contributions, this separation is inevitably blurred. In Section 4, the term 'application' refers to extensions of existing statistical methods to increase their suitability for handling real data thanks to the increased flexibility and the more realistic assumption on the data distributions. In Section 5, the term 'application' refers to publications where the primary motivation was some real application or a set of real applications; this does not exclude that the possibility that the motivating problems lead to methodological advances. We have, instead, not included those publications where real data were employed as a purely numerical illustration of some eminently theoretical development.
The reader familiar with the monograph of Azzalini and Capitanio [7] can recognize that, to some extent, the present review can be regarded as an update of Chapters 7 and 8 of that work. For space reasons, we have opted to avoid extensive duplication of that material, apart for those parts which are necessary for exposition of contributions of interest. We have focused mainly on newer contributions, especially in a multivariate setting. Even with this restrictions, a full account of all publications would have led to a too long exposition. This has implied to make a selection, which again involves some subjective judgment. In this respect, a special note is due for the theme of stochastic processes, time series and, especially, spatio-temporal processes, where there seem to persist a debate among specialists on the correctness of certain formulations; our choice has been not to include here many such publications, given the intended non-specialist audience of this review.
Our emphasis is primarily on the multivariate setting. This choice is largely due to the growing emphasis on this context, as underlined in Section 1.1. Obviously, most facts hold in the univariate case, as well, but some of them become void there. Occasionally, we shall deal with univariate distributions, especially so when the point to be highlighted carries on directly from the univariate to the multivariate setting and discussion in the univariate case facilitates exposition.

Basics of Elliptical Distributions
As the term skew-elliptical distributions suggests, these are obtained as extensions of elliptically contoured (EC) distributions, briefly called 'elliptical distributions'. To start with, establish then notation and recall some basic facts.
We shall only deal with absolutely continuous EC distributions, which represent the only case relevant for the developments to follow. Consider a functionp from R + to R + such that, for a positive integer d, Then, a d-dimensional continuous random variable X is said to have an elliptical distribution, with density generatorp, if its density is of the form where µ ∈ R d is a location parameter, Σ is a symmetric positive-definite d × d scale matrix and c d = Γ(d/2)/(2 π d/2 k d ). In this case, we write X ∼ EC d (µ, Σ,p).
The EC class enjoys a remarkable number number of attractive formal properties, including closure under marginalization, affine transformations, and conditioning on the values taken on by a subset of the d components. For a precise statement of these facts, as well as many others, we refer to standard accounts, such as the book by Fang et al. [8].
The most prominent representative of the EC class is the multivariate normal distribution, which occurs whenp(u) = exp(−u/2). The normal family plays an important role within the EC class, as it generates the subclass of scale mixtures of normal variates, defined as follows. If X ∼ N d (0, Σ) and S is an independent positive variable which plays the role of 'random scale factor', it is easy to check that the scale mixture Y = µ + S X has distribution of EC type. In the important instance, where we obtain the multivariate Student's t distribution on ν degrees of freedom; in this case, we write Y ∼ t d (µ, Σ, ν) in an obvious notation. For future reference, it is appropriate to recall a relatively lesser known fact about the EC class and the scale mixtures of normals. While the p-dimensional marginal of a d-dimensional EC variable is still of EC type, as already recalled, this does not ensure that its marginal density functions belong to the same parametric family of the original density.
Kano [9] has studied necessary and sufficient conditions to ensure that a marginalization operation keeps the distribution within the same parametric class, which he calls 'consistency property'. In operational terms, the required condition is that the EC distribution allows a stochastic representation of type (3) where the distribution of the mixing variable S does not depend on d.
As a key example, this condition holds for S in (4); hence, the marginals of a Student's t distribution are still of the same type. There are instead several EC families where, although a representation of type (3) exists, the distribution of S depends on d; hence, marginalization consistency does not hold. The motivating instance of this sort in Kano [9] is the class of multivariate exponential power distributions, which fails to be consistent under marginalization.

The Multivariate Skew-Normal Distribution
Like the multivariate normal distribution is at the core of the EC class, so is the multivariate skew-normal (SN) with respect to the SEC class and many other related formulations. We present the SN distribution following essentially the constructive route of Azzalini and Dalla Valle [10], also in view of its relevance for the subsequent discussion.
Start from independent variables U 0 = (U 01 , . . . , U 0d ) ∼ N d (0,Ψ) and U 1 ∼ N(0, 1), wherē Ψ is a positive-defined d × d correlation matrix. Next, for any vector δ = (δ 1 , . . . , for j = 1, . . . , d. Equivalently, write in a more compact form where D δ = I d − diag(δ) 2 1/2 . For future use, define also the term-by-term transform which is invertible via δ j = 1 + λ 2 j −1/2 λ j . Some algebraic work yields the density function of Z, which is where ϕ d (·;Ω) denotes the N d (0,Ω) density, Φ(·) the N(0, 1) distribution function, and Clearly, the distribution of Z is regulated by the pair (Ψ, δ), or equivalently by (Ψ, λ). If δ = 0 = λ, then Z coincides with the normal variable U 0 ; otherwise, the non-linear transformation of U 1 in (6) produces a non-normal distribution of Z. Correspondingly, in the latter case, the vector α in (10) is non-zero, so that the density (8) is the product of the EC factor ϕ d times a perturbation factor Φ(α x). The result is a density function with skew-elliptical contour level curves. The contour level plots in Figure 1 display two examples of SN densities (8) with d = 2.  Figure 1. Contour level plots of two bivariate skew-normal densities given by (8); in both cases,Ω is equal to the identity matrix, α is (2,4) in the first plot, it is (2, −6) in the second plot.
If was later shown by Azzalini and Capitanio [11] that the pair (Ω, α) uniquely identifies (Ψ, δ), for any choice of the correlation matrixΩ and any d-vector α. So (Ω, α) can equally be adopted as a legitimate parameterization of the family. Since in applied work we need to regulate location and scale, introduce the additional transformation where ξ ∈ R d and ω = diag(ω 1 , . . . , ω d ) has positive diagonal elements. We correspondingly write In a broad sense, the role of the various regulating parameters is as follows: ξ controls location, ω scale,Ω the dependence structure, α departure from normality. The phrase 'in a broad sense' is there to caution on the fact that, when we come to commonly used summary measures, these are actually influenced by all terms. For instance, the mean value of Y is not just ξ, but a function of all components parameters. This is the reason why the symbol ξ has been adopted, instead of the more popular µ. However, it is true that, for a given choice of (Ω, α), an addition +u, say, to the ξ vector produces precisely the same addition +u on E{Y}, which justifies use of the name 'location parameter' for ξ.
Besides the stochastic representation (6) of additive form, the SN distribution allows other representations. An especially important one is the following, based on a conditioning mechanism. Consider the multivariate normal variable where Ω * is a positive-definite correlation matrix. Then, both variables have density function (8). Strictly speaking, the equality sign following the random variable Z should be the one of equality in distribution, but the simpler notation matches the one for Z. In addition, this notation expresses how the result in going to be employed in practice, for simulation purposes, to sample pseudo-random numbers from the target distribution. The additive stochastic representation (6) and the two variants of the conditioning (13) establish direct connections between the SN family and various subject-matter motivated constructions, as elucidated later on in more detail. A further type of stochastic representation is presented at pp. 129-130 of Azzalini and Capitanio [7].
A direct implication of the second variant of representation (13) is the property of perturbation invariance: any even transformation of Z has the same distribution of the same transformation applied to X. Some noteworthy implications are The perturbation invariance property of the SN family is a special instance of a more general statement presented in the next subsection.
The SN distribution enjoys a high level of mathematical tractability, which approaches the one of the multivariate normal distribution. Among its many formal properties, a special mention is due for closure of the family with respect to affine transformations. Specifically, if Y ∼ SN d (ξ, Ω, α), then for a q-vector a and a full-rank d × q matrix A; hereα is a q-vector, in which the explicit expression is given, for instance, on p. 133 of Azzalini and Capitanio [7]. A direct implication is that any q-dimensional marginal distribution of Y still has SN distribution. The moment generating function of Y takes the simple form Then, from the cumulant generating function K(t) = log M(t), we obtain where µ z = E{Z} = (2/π) 1/2 δ.
One of the few limitations of the multivariate SN family is the lack of closure under conditioning, that is, if Y = (Y 1 , Y 2 ) has a SN distribution, then the distribution of Y 1 conditionally on the value taken on byY 2 is not of SN type. This property can be achieved by considering a simple extension of the family, denoted 'extended skew-normal', which includes an extra parameter, τ ∈ R. The density function corresponding to (8) is now When Y is defined as at (11), the corresponding density function at x ∈ R d is and we write Y ∼ SN d (ξ, Ω, α, τ). Clearly, if τ = 0, we return to the original SN distribution. The corresponding moment generating function is which shows that the moments of the distribution of Y are non-linear functions of τ.
The extended skew-normal distribution arises under the constructive route describe above, if the term |U 1 | in (5) is replaced by one having distribution N(−τ, 1) truncated below 0. Similarly, the distribution arises from a conditioning mechanism similar to the first expression of (13) if the condition X 1 > 0 is replaced by The introduction of the additional parameter τ widens the ranges of the measures of asymmetry and kurtosis with respects to the original SN family. However, the increase of flexibility in this sense is not substantial, as visible from Figure 2.5 of Azzalini and Capitanio [7] for the univariate case. The main advantages of the extended SN family are arguably two others: (a) the more general generating mechanisms indicated in the previous paragraph, which make the distribution a more plausible 'physically motivated' model in a number of applications; (b) as anticipated, the extended family is closed under a conditioning operation, a property which turns out to be useful in some applications; the explicit expression of the corresponding parameters after conditioning are available, for instance, on p. 151 of Azzalini and Capitanio [7].
The price to pay for these attractive formal properties is the lack of an analogue of the second stochastic representation in 13. In turn, this removes the property of perturbation invariance, so that (14) and similar facts do not hold any longer.
Many other formal results can be obtained on the multivariate SN distribution, many of them with relatively little effort. A self-contained account of the main results, built collecting a number of contributions from the literature, is presented in Chapter 5 of Azzalini and Capitanio [7]. See, specifically, the results of Genton et al. [12], Capitanio [13], Balakrishnan and Scarpa [14], and Balakrishnan et al. [15], among others.

A General Result
The overall target of the rest of our presentation is to introduce a number of variants and extensions, of various degrees of generality, of the SN distribution. Many of these other constructions replace the normal density appearing in (8) by some alternative, but other directions are also considered. A substantial fraction of these extensions can be embraced in the following result presented by Azzalini and Capitanio [11] (p. 599). Here, and in the following, the phrase 'X is symmetric about 0' referred to a random variable X is a shorthand for 'X is symmetrically distributed about 0. Proposition 1. Denote by T a continuous real-valued random variable with distribution function G 0 , symmetric about 0, and by Z 0 a d-dimensional variable with density function f 0 , independent of T, such that the real-valued variable W = w(Z 0 ) is symmetric about 0. Then, is a density function.
Proof. Note that the random variable T−W is symmetric about 0, and expand P{T−W ≤ 0} as the expected value of conditional probabilities, leading to An interesting fact is that the re-normalizing constant in (18) is universally 2. In addition, in the light of the prominent role played in the subsequent discussion by the condition of symmetry for f 0 , it is worth emphasizing that f 0 here can be any density function.
Implicit in the proof of Proposition 1, there is an acceptance-rejection argument which, combined with symmetry of w(Z 0 ), leads to the following stochastic representations. If Z 0 and T are random variables as in Proposition 1, then both variables have density function (18). A corollary of the second representation in (19) is the property of perturbation invariance stated next.

Proposition 2.
Under the assumptions of Proposition 1, denote by Z 0 a random variable with density function f 0 and by Z a random variable with density f . Them, for any function t(·) taking values in R q (q ≥ 1), such that t(x) = t(−x) for all x ∈ R d , the transformed variables t(Z) and t(Z 0 ) have the same distribution.
In the univariate case, densities (18) can be viewed as weighted forms of f 0 , in the sense examined by Rao [16]. This particular sub-class of weighted distributions enjoys special features, namely the stochastic representations (19) and the implied property of perturbation invariance.

Symmetry-Modulated Distributions
After the publication of the Azzalini and Dalla Valle [10] paper, subsequent contributions in the literature have generated a number of extensions at progressive level of generality. While we shall focus mostly on the stream labeled 'SEC distributions', it is appropriate to delineate at least the main traits of the broader picture, for a more comprehension view. We set our exposition at a level which encompasses the majority of proposed formulations, especially those more directly connected to applied work, while retaining a relatively simple mathematical level.
Even more general formulations, moving along two different paths, are those of Arellano-Valle et al. [17] and Jupp et al. [18]. However, these treatments encompass also many situations which for technical reasons cannot be pursued operationally. The discussion below covers the vast majority of the practically workable constructions, as known at the time of writing.
Proposition 3 below, obtained by Azzalini and Capitanio [19], represents an easy-to-use restricted version of Proposition 1, since under conditions (20) it is immediate that the condition of symmetry of w(Z 0 ) required in Proposition 1 is fulfilled. The result provides a simple method to build variant forms of a baseline density, f 0 say, which is required to be centrally symmetric density; that is, such that . Multiplication of f 0 by a modulation factor produces perturbed or modulated versions of f 0 . Proposition 3. Denote by f 0 a probability density function on R d , by G 0 (·) a continuous distribution function on the real line, and by w(·) a real-valued function on R d , such that for all x ∈ R d , y ∈ R. Then, (18) is a density function on R d .
In independent work, Wang et al. [20] have obtained an essentially equivalent formulation, which can be expressed as follows. On setting G(x) = G 0 {w(x)}, we can rewrite conditions (20) as for all x ∈ R d , and write the density as Not only if w and G 0 satisfy (20), then G(x) = G 0 {w(x)} satisfies (21), but also the essentially converse statement can be shown to hold: for any function G(x) satisfying (21), there exist functions w, G 0 satisfying (20) leading to the same G; hence, the same f , is in fact a multitude of them.
The conclusion is that, for any centrally symmetric density f 0 (x), Proposition 3 and the variant form based on (21) generate the same set of distributions. In this sense, they are equivalent formulations. Which of the two forms to adopt is a matter of convenience and taste. Version (21) is more suitable for mathematical work, but the actual construction of functions G with the required properties is more naturally approached using form (20).
Clearly, the SN density (8) is of type (18) The vast majority of the distributions to be discussed later are of type (18) or some 'extended' version of them. The term skew-symmetric distributions is often used to identify these constructions, but we prefer the term symmetry-modulated for the following reason: skewness is typically the form of departure from symmetry associated with linear or mildly non-linear choices of w(x), as it was the case in earlier constructions of this logic, but more elaborate functions w(x) lead to distributions where skewness is not the more prominent feature. Figure 2 illustrates this point by using a strongly non-linear functions w(x) to modulate, via Φ(w(x)), a standard bivariate normal density function with independent components. The extension of the stochastic representations in (13), after orthogonalization to independent variables, is as follows. Under the conditions of Proposition 3, denote by X 0 a random variable with density f 0 and by T an independent variable with distribution function G 0 . Then, both variables have density (18). The second form in (22) is more convenient for random number generation, since no rejection of a generated outcome X 0 can occur; the first form allows us to draw connections with a number formulations in applied domains where a selection mechanism of sample values occurs, regulated by a condition of type T ≤ G 0 (X).
Similarly to the SN distribution and Proposition 2, the second variant of representation (22) leads to the perturbation invariance property: under the above conditions, if X 0 and Z have density function f 0 and f , respectively, and t is a even function from R d to R q , then t(X 0 ) and t(Z) have the same distribution. For instance, if Z = (Z 1 , Z 2 ) is a random variable where density is depicted in each of the panels of Figure 2, then Z 2 1 + Z 2 2 ∼ χ 2 2 , since this fact holds for the associated variable X 0 .

SEC and Other Modulations of EC Distributions
Any application of Proposition 3 with a baseline density f 0 of EC type (2) produces a density commonly labeled 'skew-elliptical', or SEC in our terminology; the terms apply even after a possible a location shift of the distribution. In this section, we review the main constructions of this sort and some closely related ones. The prefix 'skew' attached to many construction of this sort arose from the earlier constructions, where the most noticeable form of departure from a symmetric or elliptical shape was as asymmetry. However, there are cases, such as those depicted in Figure 2, where the asymmetry is not the more prominent feature. In these case, a term like 'modulated elliptical contoured' (MEC) distributions may be more appropriate.

SEC Distributions with a Linear-Combination Argument
An early form of SEC distribution has been put forward by Azzalini and Capitanio [11] as a corollary of Proposition 1. If we take f 0 to be of EC type, the most immediate SEC construction sets w(x) to be a linear combination of the x components, so that where α is a vector of constants. The linear form w(x) = α x has the appeal of conceptual simplicity; it also mimics the structure of the multivariate SN density (8). In fact, when f 0 ≡ ϕ d , we go back to the SN density.
For any given f 0 , (23) allows a wide range of choices for G 0 , as long as they meet the condition G 0 (t) + G 0 (−t) = 1. This flexibility is appealing but also problematic, in a sense, since it is there that there is no naturally preferable choice of G 0 .
By analogy with the SN case, we may think of taking G 0 equal to the distribution function of the univariate marginal of f 0 . Unfortunately, this analogy does not seem to convey any special virtue or simplification. This point is confirmed by consideration of univariate instances of (23), as examined by Gupta et al. [21] and Nadarajah and Kotz [22]. Even computation of basic quantities, like lower-order moments, becomes appreciably more complicated than in the SN case.
A prominent instance in this respect is provided in Section 4 of Nadarajah and Kotz [22], where f 0 is taken to be the univariate Student's t on ν degrees of freedom and G 0 the corresponding distribution function, leading to what could we denote 'linear skew-t distribution'. The expression of odd moments in their Theorem 1 is undoubtedly intricate.
Besides the SN case, densities (23) do not lead, as far as we know, to results matching stochastic representations (6), (13) and their useful implications. What remains valid are representations (22) and the connected property of perturbation invariance. From the latter, a key implication is that even-order moments are equal to those of f 0 , when they exist.

SEC Distributions via Conditioning
Branco and Dey [23] have put forward an alternative SEC formulation which has proved to be very fruitful and has, since then, received much attention. Explicitly or implicitly, it appears to be regarded by many authors as the standard form of SEC distributions. The key idea in Branco and Dey [23] is to extend the first generating mechanism in (13) by applying it when the normality assumption for X in (12) is replaced by the one of an elliptical distribution.
More specifically, we start from X = (X 0 , X 1 ) with density of EC type (2) and Σ equal to Ω * in (12) and, for simplicity, set µ = 0. The distribution of (X 0 |X 1 > 0) can then be written as where f 0 is the marginal density of X 0 induced by p(x; 0, Ω * ), and F q(z) is the associated conditional univariate distribution function of the remaining component. Except in the normal case, this conditional distribution depends on q(z) = z Ω −1 z. An alternative integral form of density f (z) is given in expression (2.5) of Branco and Dey [23]. Within the class of distributions which can be built from this paradigm, special interest lies on the wide sub-class where the parent (d+1)-dimensional density p(x; 0, Ω * ) corresponds to a scale mixture of normal variates as specified in (3). It can be shown that (24) corresponds then to a scale mixture of SN variates, retaining the same mixing distribution for the scale factor. With the inclusion of location and scale parameters, these scale mixtures can then be represented more conveniently as where Z 0 ∼ SN d (0,Ω, α), and ξ, ω are as in (11). Besides its conceptual significance, (25) facilitates computation of moments of Y, hence of (24), when moments of S up to the same order are known, given that the SN moments are available. The paper of Azzalini and Capitanio [19] has explored further the distributions generated by Proposition 3 and, within this frame, the SEC class (24) in more detail. Among other results, their Proposition 9 states that the SEC distributions (24) can also be obtained via a different route, that is, an additive construction like in (5) and (6) but with (U 0 , U 1 ) having now a joint (d + 1)-dimensional EC distribution with a scale matrix like the one of U in (6).
Another question examined in Azzalini and Capitanio [19] was: do densities of type (24) belong to the SEC or MEC family as described as the beginning of the present section? That is, is (24) an EC density modulated as in Proposition 3? Among other facts, this condition is important to ensure the property of perturbation invariance. It is not immediate that the final factor of (24) can be written in the form of the matching factor in (18). The answer turned out to be 'yes' for a range of parametric families examined in the paper. Confirmation that the answer is 'yes' in all cases was provided later on by Azzalini and Regoli [24]. In this paper, it is also proved that densities (24) are quasi-concave, which means that all their level curves delineate convex regions. This property rules out twisted shapes like the one on the right side of Figure 2, so that the term SEC appears appropriate in this case.
An important instance of (25) occurs when S is chosen as in (4). The random variable so obtained is said to have a skew-t (ST) distribution, and we write (25) reduces to a normal one and the ST distribution reduces to a regular Student's t; (ii) if ν → ∞, then S → 1 in probability, and the ST distribution reduces to the SN. The ST distribution has been the focus of much subsequent work, both on the theoretical and the applied side. In this section, we summarize the theoretical aspects; subsequent sections deal with the more applied side.
For the ST family, compliance with conditions (20) is visible from an alternative and simpler expression of the density function: in the normalized case ST d (0,Ω, α, ν) with null location and unit scale parameters, this is where t d (z;Ω, ν) is the regular multivariate Student's t density on ν degrees of freedom, T(·; ρ) is the univariate t distribution function of ρ degrees of freedom, and q(z) = z Ω −1 z. Since the argument of T(·; ν + d) is an odd function of z, w(z) say, it is immediate that (26) falls within the conditions of Proposition 3. Expression (26) was also obtained independently by Gupta [25]. Besides a simple expression of the ST density function, the quoted papers include results for its distribution function, moments, the distribution of affine transformations and of quadratic forms. Additional results on the multivariate ST distribution, or more generally on scale mixtures of multivariate SN variables, are provided by Kim and Mallick [26], Kim [27], Capitanio [13], Giorgi [28], Arevalillo and Navarro [29], and Kim and Zhao [30], among others.
Although the ST d (ξ, Ω, α, ν) distribution features only a scalar ν parameter in addition to the SN d (ξ, Ω, α) counterpart, this ingredient provides the possibility to regulate the decay of the density in the tails of the distribution, in the univariate and the multivariate domain, where in the latter case the term 'tail' must be suitably intended. The substantial increase in flexibility so achieved in practical data fitting has been demonstrated by Azzalini and Genton [31], considering data from a variety of application areas and data types, and it has subsequently been confirmed by other authors in a range of applied problems.
Similarly to the SN distribution and with similar motivations, an 'extended' ST distribution has been studied, in independent work of Adcock [36] and of Arellano-Valle and Genton [37]. Both formulations start from a (d + 1)-dimensional Student's t variable as the building ingredient, but interestingly, they follow different paths. The development of Adcock [36] is based on the additive construction, qualitatively similar to (6), with the U 1 variable is centered at τ; Arellano-Valle and Genton [37] use instead a conditioning mechanism, like for (24), but with condition X 1 > 0 replaced by X 1 + τ > 0. Up to minor differences in the adopted parameterization, the density of the variable, once equipped with location and scale parameters like in (25), turns out by either route to be where Q(x) = x Ω −1 x and α 0 is as in (16). Additional related work, directly motivated by applications, has been presented by Lee et al. [38] and Marchenko and Genton [39], to be discussed later on. Additional work of Adcock [40] concerns both the theoretical and the applied side, with special emphasis on quantitative finance. It might be noticed that, in the developments presented above, there is a prevalence of a limited number of SEC families, as compared to the large number of conceptual possibilities. There is especially much emphasis int the ST distribution and its extended version (27), and a few more instances, such as the skew-slash distribution, while there is very little consideration of, say, the multivariate exponential power distribution and other families. One aspect which comes into play is the result of [9], recalled in Section 2.1, implying that many families, including the multivariate exponential power distribution, are not closed under the marginalization operation; that result was stated for the EC class, but it carries on for the SEC class. Another advantageous property is closure under a linear transformation of a multivariate variable, or at least the existence of a tractable distribution for such transformation.
More complex formulations, such as some of those discussed in the review paper [5], allow for a separate tail parameter for each component variable. This offers an increased level of flexibility, but at some cost in terms of mathematical tractability. When one is setting-up the mathematical formulation for a given applied problem, a balance must be struck between various components, especially flexibility, mathematical tractability and subject-matter interpretability. The appropriate point of balance depends on the problem at hand.

Constructions Based on Several Latent Variables
In some form or another, all d-dimensional distributions discussed so far, except for the general result of Proposition 1, are obtained starting from a (d+1)-dimensional symmetrically distributed random variable of which one component is not observed but in which its role is to modify the symmetric distribution of the others. This fact is most easily seen in the SEC construction by conditioning leading to (24) where the component X 1 of the vector X constitutes a latent variable, where the effect is to perturb the symmetric distribution of X 0 . In other instances, this component may be slightly less obvious to identify but it is there. For example, in the additive construction (5)-(6), the univariate variable which generates asymmetry is |U 1 |.
The constructions to be examined below follow similar logics to the earlier ones, but they involve use of multiple latent variables. This extension should allow to model in a more realistic way a wider range of practical problems. Moreover, in most cases, although not all of them, the distributions so obtained are regulated by a larger number of parameters, for a given dimensionality d and equivalent structure of the other components. Therefore, a more adequate fit to observed data is expected. At the same time, when such an elaborate construction is not really necessary in a given applied problem, there is a risk of over-parameterization, not necessarily in the formal sense of a non-identifiable model, but as one with a nearly-flat log-likelihood function, and, consequently, poorly estimated parameters.
An early instance of this scheme is the one of Sahu et al. [41], which is closely linked to the earlier SEC formulation of Branco and Dey [23]. Instead of a 1-dimensional conditioning operated on a (d+1)-dimensional EC variable, now a d-dimensional conditioning is applied to a (2d)-dimensional EC variable. However, given the required constraints on the EC variance structure, the number of parameters regulating the distributions is the same of the analogous construction using the earlier scheme. Hence, for instance, the new form of skew-t distribution so constructed is regulated by d(d+1)/2 + 2d + 1, the same number of (26). Note that here, when d > 1, the term skew-normal and skew-t distribution refer to different distributions of those presented in Section 2.2,3.2, respectively.
Shortly afterwards, a number of generalizations of the SN distribution (8) have been put forward by González-Farías et al. [42] and González-Farías et al. [43] with the name 'closed skew-normal' (CSN) to underline the property of closure under convolution, as well as under conditioning. In this formulation, one starts from a (d+m)-dimensional normal variable (X 0 , X 1 ) and considers the distribution of X 0 conditionally on X 1 +τ > 0, where X 1 and τ are m-dimensional and the inequality must be intended to hold for each component. Similar constructions have been put forward independently by Liseo and Loperfido [44] under the name 'hierarchical skew-normal' and by Arellano-Valle and Genton [45] under the name 'fundamental skew-normal'. The not-so-obvious interconnection among these three proposals have been examinedby Arellano-Valle and Azzalini [46], who have shown their essential equivalence up to a change of parameterizations, once some redundancies are removed. This clarification has led to a 'unified skew-normal' (SUN) formulation.
An additional development of Arellano-Valle and Azzalini [46] has been the extension of this type of construction when the distribution of (X 0 , X 1 ) is of EC type instead of normal. This line of development has been carried on by Arellano-Valle and Genton [47] and the associated term is 'unified skew-elliptical' (SUE or SUEC) distributions. Further work in this direction has been presented by Arellano-Valle et al. [48], Abtahi and Towhidi [49], Aziz and Gupta [50], Gupta et al. [51], and Richter and Venz [52].
To facilitate the comprehension of the relative position of the main classes of distributions discussed so far, plus some others to be presented later, it is useful to examine

Other Mixtures of SN Variables
Since scale mixtures of SN variables with representation (25) constitute the more interesting subclass of SEC distributions (24), it is not surprising that various authors have examined other forms of SN mixtures.
Arellano-Valle et al. [53] and Arellano-Valle et al. [54] have studies shape mixtures of multivariate SN variables, in a number of variant constructions. Proceeding further in this direction, Arellano-Valle et al. [55] have examined the more general construction with random mixtures of both the scale and the slant parameter of SN variates. The SSMSN class so generated encompasses a number of existing formulations, forming a wide subset of those generated by Proposition 3; see their Proposition 1 which must be interpreted in the sense stated in its preceding paragraph. Further work in the same direction, but only for the univariate case, has been presented by Tamandi and Jamalizadeh [56].
Arslan [57] has introduced the idea of variance-mean mixture of the multivariate SN variables, that is, with representation where ξ, ω and Z 0 are as in (25) and γ is a d-dimensional parameter. The positive random variable V, independent of Z 0 , operates a mixing operation on the SN variable Z 0 in a way which replicates the one in the well-known construction of the variance-mean mixture of a normal variable; see Barndorff-Nielsen et al. [58]. Similarly to the mixing of normal variables, the more prominent special case of (28) occurs when V has a generalized inverse gamma distribution.

Miscellanea
In the above exposition, a recurring scheme to achieve perturbation of symmetry has been via conditioning events, such as (X 1 > 0) in (13) or (X 1 + τ > 0), for the extended SN and CSN/SUN distribution or some similar sort of one-sided constraints in the more general case of EC and other centrally symmetric distributions. Some qualitatively similar event, such as T ≤ w(Z 0 ), is involved in (19). The general treatment of Arellano-Valle et al. [17] deals instead with an arbitrary conditioning event. Starting from a (d + m)-dimensional variable (U, V), the marginal density f V of the V component, as modulated by the conditioning event {U ∈ C}, is This expression is extremely general, since C is an arbitrary set. However, to translate (29) into operational constructions, one must obtain expressions for the two probabilities appearing there. These computations are amenable in some favorable cases only, many of which have already been examined in the formulations discussed earlier. Still, some additional tractable cases exist. One of them has been considered briefly by Genton [59] and more systematically by Kim [60] where the conditioning event involves a set of bilateral constraints of type (a<U<b), say, in a normal distribution context.
In another direction of work, Jamalizadeh and Kundu [61] apply a multivariate version of the Birnbaum-Sanders transformation of variables to the multivariate SN distribution.
To demonstrate the potential of Proposition 1 beyond the domain of validity of Proposition 3, Azzalini and Regoli [62] have constructed densities with baseline f 0 of EC type but without enforcing the condition w(−x) = −w(x) in (20). The resulting distributions enjoy some interesting formal properties and they can have similar flexibility of those generated from Proposition 3. In the diagram of Figure 3, the set of these distributions is labeled 'MEC-2018'.
An account on matrix-variate SN distributions and extensions, with related formulations for applied work, is provided by Gupta et al. [63] (Chapter 11); this includes references to earlier work in the same direction.

Cave Nomina
An unpleasant aspect of the stream of literature under consideration is represented by certain aspects of its terminology, especially confusing to the newcomer who is hoping to grasp the key concepts. The problem comes in two variants. In the first form, we encounter different names attached to the same object, or very nearly the same object. A good case in point is represented by the names closed SN, hierarchical SN, fundamental SN, and unified SN. They all refer to the same class of distribution up to inessential details, but, as explained in Section 3.3, the first three of them have been developed independently from each other, and the attempt to unify them under the fourth name has not succeeded. So now there exist four names, and at least three of them in current use, all referring to the same entity.
The second form of the problem is the reverse of the first one: the same name is attached to different objects. Arguably, the most prominent instance of this sort is represented by the term 'skew-t' and its acronym ST. As far as we know, the first occurrence of the term is in Branco and Dey [23], later adopted by many authors with the same meaning; we regard this version, with density (26), as the more convincing ST construction. However, shortly afterwards, a different distribution, again called skew-t, has been presented in Sahu et al. [41]; this ST distribution coincides with the earlier one only when d = 1. But these are only two examples of skew-t distributions among the many, not all developed inside the present stream of literature. We do not list all such instances, but we must at least mention the paper by Giorgi and McNeil [64], where two types of ST distributions coexist in the same publication: one with density (26), and the second one coming from an entirely different construction, linked to the GH distribution.
Those mentioned here are only some examples of confusing terminology; unfortunately, others exist. Given the persistence of the current nomenclature, it seems difficult to imagine that the terminology will be rationalized, except perhaps after a long time. Therefore, readers of the literature must not rely simply on the names of the distributions: a direct inspection of the formal expressions of these densities, or some equivalent ingredient, is required.

Applications as Methodological Developments
There exists two logically distinct meanings of the term 'applications', at least in the present context. One such meaning refers to the use of the above-described distributions as a formal reference set for improving of existing statistical methods by building them on more flexible and realistic distributional assumptions, as compared to traditional methods, which are typically built on the normality assumption. The other meaning of 'applications' refers to more specifically application-oriented work. Clearly, these two sorts of activity are not completely separate: methodological developments are usually illustrated by some numerical work, sometimes on non-trivial practical problems, and vice versa application-motivated work may involve substantial theoretical development. However, in most cases there is one component which drives the author's work and we organize our exposition separating the material accordingly. The rest of the present section deals with the more methodology-oriented developments, while the next section focuses on the more applied side.

Regression Models and Variants
In univariate regression models, a set of p, say, explanatory variables x i is observed alongside the response value on the ith individual. The x i vector, regarded as non-random, is used to model some quantity of interest in the distribution of the response variable, Y i . In the present context, the distribution of Y i is assumed to be one of those examined in Section 2,3, most often the SN or the ST distribution but sometimes some other one. The quantity of interest is the location parameter for the ith response, denoted ξ i , which is expressed as a function of x i and some vector parameter, β say, which then becomes the primary parameter of interest. In the basic setting, the connecting function is a linear one so that we write ξ i = x i β, but non-linear functions have also been considered. In the case of multivariate response variables, this scheme is generalized in the same way as within the classical theory.
Much work in this area has dealt with univariate response variables, and it can broadly be separated in two parts. A set of contributions have developed diagnostic methods and similar tools which extend popular methods from the standard methodology; these include Bolfarine et al. [65], Ferreira and Arellano-Valle [66], and Labra et al. [67]. Other contributions work with formulations similar to but different from classical regression, such as measurement in error models, change-point detection, and others; these include Arellano-Valle et al. [68], Figueiredo et al. [69], and Arellano-Valle et al. [70].
A more limited set of contributions have considered similar problems in the multivariate context. Expressions of the score functions of the (profile) log-likelihood and other results useful for maximum likelihood estimation (MLE) are provided by Azzalini and Capitanio [11] for the SN case, and by Azzalini and Capitanio [19] for the ST case. Multivariate measurement error models have been considered by Lachos et al. [71] and by Arellano-Valle et al. [72].
For the analysis of binary data in a context of generalized linear models, Chen et al. [73] have proposed to use the distribution function of the univariate SN distribution to extend the probit model. This direction has been developed further in a Bayesian framework by Bazán et al. [74]. In a similar logic, item response theory has been extended to a more general formulation by Bazán et al. [75]; see also [76,77]. For Bayesian inference of a probit model, Durante [78] shows that the exact posterior distribution of the generalized regression parameters under Gaussian prior is of SUN type.

Finite Mixtures and Model-Based Clustering
Finite mixtures of g, say, component distributions provide a way to introduce a further layer of flexibility to data fitting. Intheir essential formulation, a density of interest, f (x) say, is written as where f j is a component density regulated by a set of parameters θ j , and π j is a positive probability such that π 1 + · · · π g = 1. While not conceptually necessary, it is usually the case that the g component densities are assumed to belong to the same parametric family, each with a different set of parameters, θ 1 , . . . , θ g . Suitable conditions are necessary to avoid non-identifiability of the model. More detailed information is provided by classical general accounts, such as the monograph of McLachlan and Peel [79]. Construction (30) is particularly attractive in the context of model-based clustering, where it lends itself to a natural interpretation by regarding f j as the density of the jth subpopulation, having relative weight π j in the overall population which comprises g groups or subpopulations. Consequently, the literature on finite mixtures has a major overlap with the one on model-based clustering, especially in the multivariate context.
As with many other research areas, this one has also long focused on the assumption of normality, which means that each component density of (30) was assumed to be Gaussian. Later on, more flexible parametric families have been adopted as component densities, to introduce a more realistic formulation and to reduce the number g of components, since in many instances the normal assumption requires more than one component to accommodate non-Gaussian data patterns even of evidently clustered points. In this development, the proposals discussed in Section 2,3 have been among the earlier alternatives to the Gaussian assumption which have been considered, and their role is still relevant. An exhaustive list of all pertaining contributions would be very long, cumbersome to compile, and almost instantaneously obsolete. Therefore, we highlight a selection of titles, which seem to us of special relevance for one reason or another.
Earlier work of this type has naturally started in the univariate context: Lin et al. [80] have developed a methodology for fitting finite mixtures of SN distributions; Lin et al. [81] carried on this to ST distributions. Otiniano et al. [82] have proved the identifiability of such mixtures. The next step was to extend these constructions to the multivariate case; see Lin [83], Lin [84], Pyne et al. [85], Frühwirth-Schnatter and Pyne [86], Vrbik and McNicholas [87], and Vrbik and McNicholas [88], among others. The monograph Lachos Dávila et al. [89] collects a set of papers, with each paper targeting one classical problem in this domain.
More recent contributions have often adopted additionally flexible parametric families, such as the CSN/SUN or SUEC families recalled in Section 3. [90] and the connected software tools described in Lee and McLachlan [91]. Alternative component distributions have been considered by Tamandi and Jamalizadeh [56].

A notable instance is Lee and McLachlan
The idea of finite mixture can be combined with the one of a regression model, when the relationship connecting variables varies in different subgroups of the population. A construction of this type under SN distributional assumption has been examined by Liu and Lin [92], while Zeller et al. [93] replace the SN assumption with a scale mixtures of SN variates, and Dogru and Arslan [94] use ST distributions.

Spatial and Spatio-Temporal Models
Several contributions have proposed formulations in the context of spatial statistics. This involves consideration of the distribution of a variable of interest, Y, observed at a set of points s = (s 1 , . . . , s d ) belonging to a given space S; denote by Y(s) = (Y(s 1 ), . . . , Y(s d )) the set of observed variables. In the simpler setting, the elements of S represent geographical locations, but they can represent coordinates in a 3-dimensional space or a combination of spatial and time coordinates. In the classical formulation, it is assumed that a stationary spatial Gaussian process is defined over S, with a certain dependence structure which depends of the relative position of the points. This implies that the joint distribution of Y(s) is multivariate normal, for any choice of the set of points s. Note that, for the relevant applications, it is important that, once the parameters of the spatial process have been estimated, we can compute the conditional distribution at Y(s ), where s represents some other set of points, conditionally on the values observed on Y(s). Think, for instance, of Y as the level of some pollutant over a certain region S and of s as the coordinates where d measuring stations are located.
Since in certain cases the Gaussian assumption appears unrealistic, one alternative is to replace the normal distribution by a more flexible parametric family. The SN assumption, possibly in its extended form (17), is a natural candidate here, given its simplicity and mathematical tractability. This represents the formulation proposed by Kim and Mallick [95] and adopted by some subsequent authors. Unfortunately, their formulation has been shown by Minozzo and Ferracuti [96] to be invalid, since the required spatial process cannot exist in a mathematically coherent sense with respect to marginalization operations. This problem prevents the conditioning operation indicated above. Furthermore, in Minozzo and Ferracuti [96] it is also show that various other existing proposals have similar coherence problems.
A construction free from the above inconsistency problem has been proposed by Zhang and El-Shaarawi [97]. Their core ingredient is the process where U 0 (s) and U 1 (s) are independent stationary Gaussian spatial processes with standard marginals. The process so constructed exists and, from (5), we see the each marginal of Z(s 0 ) for a single point s 0 is skew-normal. The drawback is that, when s 0 denotes a set of points, not just one, the distribution of Z(s 0 ) is not jointly skew-normal. The basic component Z(s) is then supplemented by additional terms to regulate location and scale. An extension of this model to accommodate observed temporal replicates at each geographical location, illustrated by applied work, has been presented by Schmidt et al. [98]; see also the subsequent comments by Genton and Hering [99]. Additional related work includes Hering and Genton [100], Hosseini et al. [101], Karimi and Mohammadzadeh [102], Zareifard and Jafari Khaledi [103], Ahmad and Pinoli [104], Rimstad and Omre [105], Rezaie et al. [106], and Boojari et al. [107], among others.

Methods for Biostatistics and Medical Statistics
There is a range of statistical methods targeted largely, if not entirely, to biological and medical applications which have been improved by consideration of the families of distributions described above. At the basic level, there is, as in other areas, an improvement in data fitting when flexible parametric families are adopted in place of the normal and other traditional distributions, especially so in the multivariate context. At another level, there is a role in enhancing existing methodologies and even in helping to develop new ones.
Longitudinal data exist in a range of application areas, but they are especially important in medical statistics, with a particularly distinctive imprint when in combination with time-to-event data. A substantial number of contributions in this domain has been summarized by Azzalini and Capitanio [7] (pp. [218][219][220], and (also for space reasons) we do not reproduce them here. More recent contributions, focusing especially on those connected with multivariate response, include the work of Baghfalaki et al. [108], Huang and Dagne [109], Chang and Zimmerman [110], and Jana et al. [111].
For the joint modeling of longitudinal data and time-to-event outcomes, the skew-normal and related distributions provide ingredients for the log-likelihood expressions of Barrett et al. [112] and Baghfalaki and Ganjali [113]; in the latter case, the term 'skew-normal distribution' refers to its variant form in [41], recalled in Section 3.3, not to distribution (8).
The current trend in design of clinical trials favors interim analysis and adaptive designs. The work of Shun et al. [114] shows the relevance of the skew-normal distribution in this context. The formulation of Azzalini and Bacchieri [115] for adaptive designs hinges firmly on the CSN/SUN.

Methods for Observational Studies and the Social Sciences
In observational studies, especially frequent in the social sciences, a recurrent issue is the presence of a selection mechanism in the sampling process, leading to the problem of selection bias. Under assumption of normality, the proposal of Heckman [116] provides a method to correct for this bias, which has been widely applied in the social sciences and related fields. This selection has a direct link with the stochastic representation via a conditioning mechanism of the extended skew-normal distribution and the CSN/SUN construction. See Grilli and Rampichini [117] for a detailed discussion of the connection. The link between these two domains has indicated a way to reduce, if not overcome completely, a key limitation of Heckman's method, namely its strong dependence on the validity of the normality assumption. By replacing the Gaussian assumption with a multivariate t distribution, so that the post-selection distribution is of skew-t type, Marchenko and Genton [39] have extended Heckman's method to a more flexible form, more robust to failure of the distributional assumption. Proceeding further along these lines, Azzalini et al. [118] have considered response variables with discrete distributions, hence moving towards the edge of the scope of the present review.
Small area estimation is another theme often, although not exclusively, connected to the social sciences in its broad sense. Recently, a number of authors have employed flexible parametric distributions to replace the normality assumption of existing model-based formulations. An early exploration by Fabrizi and Trivisano [119] has been followed by formulations using either univariate SN (Ferraz and Moura [120]) or multivariate SN distributions (Ferrante and Pacei [121]), for so-called area-level models. The work of Diallo and Rao [122] is on unit-level modeling based on the CSN/SUN distribution.
Another methodology often linked to social sciences application is factor analysis. Montanari and Viroli [123] make use of the convenient mathematical properties of the multivariate SN distribution to develop a skew-normal factor model. Additional work in factor analysis has been presented by Liu and Lin [124], Lin et al. [125], Lin et al. [126], and Wang et al. [127].

A Popular Benchmark
There is a data set, known as the Australian Institute of Sport (AIS) data, which is employed in a vast set of publications to illustrate numerically some methodological proposal. The data represent observations on 13 variables, mostly of biomedical nature, on 202 Australian athletes. A full description of the AIS data and the data themselves are available within the R package sn which provides computational tools for SN and ST model fitting; see [128]. Given the purely numerical nature of these exercises, they cannot be described as 'applied work', but they provide illustrative examples for a wide range of methods. Notable references which make use of the AIS data include [10,11,19,32,53,55,59,68], but this list could be far longer. There are seldom cross-paper comparisons of the various methods that are applied to the AIS data.
Having mentioned computation tools for data fitting in the R computing environment, it is appropriate to mention also similar tools for the Stata environment, presented in [129].

Applications to Real Problems
This section of the paper is concerned with applications, the majority of which employ the SN or the ST distributions, possibly in the multivariate form; however, other families have been used, especially but not only the CSN/SUN family. In most cases, authors do not set out to propose or develop new theories or methods pertaining to the application. Two exceptions are portfolio theory and risk, for which the parsimony of both the SN and ST distributions has led to new insights and methods. Papers are grouped under application theme headings.

Economics and Applied Financial Economics
There is a growing number of studies in applied financial economics that are mainly empirical in content. Harvey et al. [130] is concerned with portfolio selection using the version [41] of the SN distribution. Barbi and Romagnoli [131] employ the extended skew normal distribution for similar applications. Carmichael and Coën [132] use the SN distribution to derive explicit expressions for the skewness premium of a financial asset. Alodaat and Al-Rawwash [133] apply the extended SN distribution to the EURO/USD exchange rate. This is in contrast to results in Adcock [134] and Adcock [135] which indicate that estimated values of the extension parameter, denoted τ in Section 2.2, are often numerically different from zero.
Papers which use SN or ST distributions in conjunction with econometric methods per se are rare. An exception is due to Abanto-Valle et al. [136], who embed the ST distribution within the stochastic volatility framework of Heston [137]. They apply the resulting model to daily returns on the NASDAQ index. A second exception is in De Luca and Loperfido [138], who describe a multivariate SGARCH model, which is applied to MSCI equity indices.
There are other applications that are concerned with finance and economics in a broader sense. Chen et al. [139] present a regularization method for the multivariate ST distribution. This is applied to the hourly average electricity spot price data set for New South Wales (from Panagiotelis and Smith [140]). There are applications to credit card data, such as Brito and Duarte Silva [141] and Azzalini et al. [118]. Ozaki and Silva [142] use the SN distribution to rate crop insurance contracts in Brazil.
There exists a close connection between the theme of stochastic frontier analysis (SFA) for production efficiency and additive representations of type (5) for the SN distribution and related ones. A general account on SFA is [143]; the link between these two literatures is elucidated in its simpler SN setting at [7] (pp. 91-93), but it carries on in more advanced constructions indicated next. A detailed discussion with a sketch of an extension to SEC distributions has been presented by Domínguez-Molina et al. [144]; see also further work at [145]. Tchumtchoua and Dey [146] apply the ST distribution of Sahu et al. [41] and models for SFA to the United States primary metal industry data set of Aigner and Schmidt [147]. For data presenting repeated observations on each production unit, Colombi [148] and Brorsen and Kim [149] develop methods for SFA based on the CSN distribution; see also [150]. Filippini and Greene [151] describe several developments of SFA methods based on the SN distribution; these are applied to a study of the cost efficiency of Swiss railways.

Quantitative Finance
The applications referred to in other sub-sections are concerned with the use of statistical methods based on skew-elliptical distributions. There may be some technical developments specific to an application, but in general, there is no attempt to contribute to theory as such. Quantitative finance is different in a number of respects. First, Adcock and Shutes [152] suggest that a motivation for use of skew-elliptical distributions is that the non-negative variable in the stochastic representation represents a departure from financial market efficiency in the sense of Fama [153]. Secondly, Adcock [154] and Adcock [40] show that the multivariate extended SN and ST distributions play an important role in the theory of portfolio selection. Thirdly, as described in the next subsection, several authors have used the SN and ST distributions to develop measures of risk, typically to be applied to portfolios or to insurance data. The important early work of Simaan [155] employs a formulation where an elliptical random variable is summed to a positive variable multiplied by a vector of parameter, qualitatively quite close to the idea of skew-elliptical discussed in the earlier sections, although technically not quite the same. However, when the positive component has half-normal distribution, what we get is precisely a multivariate SN variable. The first formal treatment of the portfolio selection problem using skew-elliptical distributions is due to Adcock and Shutes [152], who employ the extended version of the multivariate SN. They argue that the extended version is required theoretically, because the distribution of the non-negative variable is not necessarily a half-normal.
The multivariate extended SN and ST distributions both lead to extensions to Stein's lemma, Stein [156]. For normally distributed returns, the consequence of Stein's lemma is that, subject to some regularity conditions, the portfolios of all expected utility maximizers will lie on Markowitz' mean-variance efficient frontier. This important result was extended to cover elliptically symmetric distributions by Landsman and Neslehová [157]. It was further extended in Adcock [154] and Adcock [40] to the multivariate extended SN and ST distributions. For expected utility maximizers, there is a single mean-variance-skewness efficient surface. Inthe same papers the results are extended to the closed version of the distributions. for which there are mean-variance-skewness hyper-surfaces.
The papers above are implicitly concerned with portfolio selection for a single investment period. When single period returns follow a multivariate SN or ST distribution, the corresponding distributions for multiple periods are sums of log SN or ST variables. Roch and Valdez [158] provide lower convex order bound approximations for such sums. Their work extends that due to Dhaene et al. [159] and Dhaene et al. [160]. To the best of our knowledge, there are few other papers based on skew-elliptical distributions that are concerned with finance theory. One exception is due to Blasi and Scarlatti [161], who study stochastic dominance rules for the SN distribution. Ina contribution to efficient set mathematics, Bodnar and Gupta [162] study the inference procedures for testing the weights in the global minimum variance portfolio when asset returns follow the CSN distribution.
The market model, which is motivated at least in part by portfolio selection, is an important component of finance theory, and provides the foundations for later developments most notably by Fama and French [163] and in subsequent papers by them and many others. Conventionally, the market model is a simple regression model in which the dependent and independent variables are, respectively, the returns on a risky asset and the so-called market portfolio, usually represented by an appropriate stock market index. Assuming multivariate normality, or elliptical symmetry more generally, the market model is the conditional distribution of the dependent variable given a value of the independent. These results are extended in Adcock [134] and Adcock [36] in which the market model is derived when asset returns follow the multivariate extended SN and ST distributions, respectively. The resulting models are non-linear in the return on the market portfolio. Beta, the traditional measure of systematic risk, is no longer defined in the conventional manner.

Risk
The word risk has many interpretations. The number of papers and other publications with the word in the title is large and their scope is broad. For the purpose of this review, there are three topics of interest: (i) the probability distribution of the losses arising from a portfolio of insurance policies, or their equivalent; (ii) the computation of measures of risk attributable to a portfolio of any type of financial asset or liability arising from the tail of the probability distribution; (iii) the distribution theory of extreme returns. Inthis sub-section, there are two measures of risk that appear commonly in relevant literature. The first is value at risk, abbreviated to VaR. The second is conditional value at risk, abbreviated to CVaR. This measure is also referred to as expected shortfall (ES) or tail conditional expectation (TCE). The definition of both measures is well known.
Bernardi et al. [164] employ the well-used Danish fire data (first used in McNeil [165]) to exemplify the use a finite mixture of SNs to model loss distributions. This paper builds on earlier work by in Bernardi [166] in which various measures of risk are derived corresponding to the finite mixture of SNs. Pigeon et al. [167] use the simplified form of the CSN distribution due to Gupta and Chen [168] to study insurance losses. Eling [169] reports an investigation fitting insurance claims data using a number of benchmark probability distributions and the Danish fire and US indemnity data sets. He reports that "..... the skew-normal and skew student are reasonably good models compared to 17 benchmark distributions.". Using the data set of Antonio and Plat [170], Pigeon et al. [167] model the occurrence of insurance claims.
Moving to VaR and CVaR and relevant distribution theory, Vernic [171] derives an expression for CVaR for the SN distribution. The analogous expression for the ST is given in Adcock et al. [172]. The paper by Tian et al. [173] extend results in Wang [174] and Wang [175] to develop tail-preserving and coherent distortion risk measures, which are intended to overcome shortcoming in both VaR and CVaR. The SN and several other skew-elliptical distributions are used to illustrate the general methods. Kollo et al. [176] examine the tail dependence of the bivariate skew t-copula. They show that dependent on the values of the skewing parameter the tail dependence coefficient may differ considerably from that derived using the Student t-copula. Inan earlier paper, Pettere and Kollo [177] use the ST copula to model future cash flows from insurance claims. Padoan [178] presents multivariate extreme models based on underlying multivariate SN and ST distributions. This is a work that is similar to the paper by Lysenko et al. [179] which is concerned with generalized SN distributions.

Biology and Life Sciences
There are many studies in these categories, addressing a wide variety of applications. Not surprisingly, there are numerous studies concerned with AIDS or HIV. Lin and Wang [180] use multiple regression based on the multivariate SN distribution to study AIDS related data. Huang and Dagne [181] use the [41] version of the multivariate SN distribution in a Bayesian setting to model AIDS data in the presence of errors in covariates. They report that "The proposed method may have a significant impact on AIDS research because, in the presence of skewness in the data and measurement errors in covariates, appropriate statistical inference for HIV dynamics is important for making reliable clinical decisions". The same data is studied in Huang and Dagne [182] which is concerned with semi-parametric mixed-effects models with SN errors. Both Baghfalaki and Ganjali [113] and Dogru and Arslan [94] employ the CSN distribution in a longitudinal study of the HIV data set from Goldman et al. [183]. Huang et al. [184] report a study using the AIDS clinical trial study data of Lederman et al. [185]. For the analysis of a longitudinal study of AIDS, Yu et al. [186] use a linear mixed model and introduce a special version of multivariate ST distribution where each component variable has a different degrees of freedom to regulate the tail behavior; see Ghosh and Hanson [187] for further details. In the same paper they also report a study of behavioral factors related to sexually transmitted infections.
The papers listed in this paragraph provide a sense of numerous issuers in health where the data exhibits skewness and for which empirical studies employ one or more of the distributions covered in this review. Zadkarami and Rahimi [188] use SN regression to carry out a study of factors affecting birth weight based on a British data set from 1958. Mahmud et al. [189] apply a probit log-SN mixture model to the Diary of Asthma and Viral Infections Study (known as DAVIS). Ho et al. [190] use factor mixture models based on the SN and ST distributions to study fatigue in breast cancer patients. Azzalini and Bacchieri [115] report the analysis of the results of a study into clinical trials on patients affected by intermittent claudication which is a symptom that describes muscle pain on mild exertion. Jamalizadeh et al. [191] use the visual acuity data from Fishman et al. [192] to exemplify the distributions of order statistics and linear combinations thereof from a bivariate selection normal distribution. Mansourian et al. [193] apply the SN distribution to a study of patients with back pain. Liu and Lin [92] employ a finite mixture regression model with SN errors to examine data from the National Survey of Child and Adolescent Well-Being (U.S. Department of Health and Human Services, Administration for Children, Youth and Families, 2003). The motivation for this paper is that assuming that errors are normally distributed in the presence of a substantial degree of skewness can lead to inaccuracy in estimating the number of classes in the mixture. Ismail et al. [194] use regression models with ST residuals in a spatial analysis of resting state brain activity in healthy controls and individuals with Down's syndrome. See also Kline and Tobias [195] (relationship between body mass and earnings using the 1970 British Cohort Study), Fernandes et al. [196] (study in genetics using mice), Chai and Bailey [197] (coronary artery calcification), Bandyopadhyay et al. [198] (linear mixed model with Bayesian Estimation applied to periodontal disease), Bandyopadhyay et al. [199] (also linear mixed models for censored responses with applications to HIV viral loads), Dagne [200] (a tobit model with ST errors applied to measles vaccine and HIV/AIDS data). Pyne et al. [85] apply a finite mixture of multivariate ST distributions to high-dimensional flow cytometric data analysis. In addition, concerned with measurements by flow cytometry, Ho et al. [201] use a ST-normal distribution to modeling cellular state transitions.
Similar to health, there is a wide variety of papers devoted to applications in agriculture. InBaghfalaki et al. [108] the multivariate SN distribution is employed in regression models with SN errors and non-random dropouts. The methods are applied to a data set concerned with mastitis in cattle and an experiment studying the effect of the inhibition of testosterone production in rats (see Diggle and Kenward [202] for details of the data set). Contreras-Reyes [203] analyzes a fish condition factor index using the SN distribution. This paper also presents information theory measures for the SN distribution. Contreras-Reyes et al. [204] use the ST distribution to study log-transformed length-at-age data of the southern blue whiting.

Environmental Issues
Lee and Skinner [205] use the SN distribution to testing fossil calibrations for vertebrate molecular trees. Stoy et al. [206] also use the SN distribution in an application to arctic ecosystem carbon exchange, as do Simolo et al. [207] in an application concerning the evolution of extreme temperatures in a warming climate. There is a related application to temperatures in China in Brito and Duarte Silva [141]. Rimstad and Omre [105] apply the CSN distribution to seismic data from the North Sea. Anagreh et al. [208] apply the SN distribution to model the potential of wind and solar energy resources in Aqaba, Jordan.
Bartoletti and Loperfido [209] use the SN distribution to model air pollution data recorded in 2006 from monitoring stations in Italy. Karimi and Mohammadzadeh [102] use the CSN distribution with a spatial regression model to analyze air pollution data were collected in Tehran on 1 March 2007. The model described in the paper has the facility to deal with missing data. Morris et al. [210] consider daily observations of maximum 8-hour ozone measurements for the 31 days of July 2005 at 1089 Air Quality System (AQS) monitoring sites in the United States and introduce a spatio-temporal model for threshold exceedances based on the ST distribution. The short theoretical note by Guttorp and Loperfido [211] demonstrates that the extended SN distribution offers a feasible model to study network bias in air quality monitoring design. Hashemi et al. [212] extend the Birnbaum-Saunders distribution by replacing a standard normal variate with its SN equivalent, and they illustrate the use of the new distribution with daily ozone concentrations data set. Zakaria et al. [213] simulate monthly rainfall data and fit a bivariate ST copula. Nathoo [214] reports a study of juvenile white spruce growth using the ST distribution.

Industrial and Technological Applications
Li et al. [215] develop Shewhart-type control charts under the SN distribution. They provide tables of charting and an illustrative example from polarizer manufacturing is described. There is similar work in Figueiredo and Gomes [216], who also provide tables and describe an application to cork stopper manufacture. Rendao et al. [217] use multivariate SN regression to examine the effect of technological progress, industrial structure and energy consumption structure on energy intensity (defined as energy consumption per unit of GDP) in China from 2000 to 2011. Youssef [218] develops methods based on the SN distribution to improve the estimation hand-set locations, a hand-set being, for example, a mobile telephone or a GPS receiver. Zadkarami and Rowhani [219] uses the SN in a discriminant analysis to classify the pixels of a remotely sensed satellite image. Balef et al. [220] use the log SN distribution to model delay variation for supply voltages in an electricity network. Tsai and Lin [221] use a non-linear SN model and develop procedures to assess the reliability information of products in which quality degrades over time. Data on traffic speed have been modeled by Zou and Zhang [222] using the SN and ST distributions. Ramprasath et al. [223] use the SN distribution for the analysis of integrated circuits working. Müller et al. [224] use the ST distribution for statistical trilateration in communication networks.
Although not within the technological area, but rather the scientific one, we include here a mention to applications in astronomy by Eyer and Genton [225] and Simola et al. [226].

Two Illustrations in Applied Domains
To facilitate the perception of the potential usefulness of the distributions presented above, and how their formal properties can be linked to subject-matter considerations and theory, we present two illustrations in applied domains.

A Finance Application
The multivariate extended skew-normal and skew-t distributions both lead to extensions to Stein's lemma, [156]. For normally distributed returns, the consequence of Stein's lemma is that, subject to some regularity conditions, the portfolios of all expected utility maximizers will lie on Markowitz' mean-variance efficient frontier. This important result was extended to cover elliptically symmetric distributions by [157]. It was further extended in [40,154] to the multivariate extended skew-normal and skew-t distributions. For expected utility maximizers, there is a single mean-variance-skewness efficient surface. In the same papers the results are extended to the closed version of the distributions, for which there are mean-variance-skewness hyper-surfaces.
Consider an extended skew-normal distribution, Y ∼ SN d (ξ, Ω, α, τ), say, with density function (16) and define the associated quantities δ, ω as in Section 2.2. The extension to Stein's lemma is where W ∼ N d ξ − τωδ, Ω − ωδδ ω and ζ 1 (x) = ϕ(x)/Φ(x) is the inverse Mills ratio, having set ϕ(x) = Φ (x). For portfolio selection, the elements of the d-vector Y now denote the return on risky financial assets. The aim of an expected utility maximizer is to choose a vector of portfolio weights, commonly denoted w, that maximize the expected value of the investor's utility function U w Y , where w Y is portfolio return. For conventional portfolio selection, the expected value of U (.) is maximized subject to the budget constraint that the elements of w sum to unity and that each element satisfies w i ≥ 0; that is, no short positions are allowed. In keeping with standard economic theory, it is assumed that U (·) ≥ 0 and that U (·) < 0. Using the notation above, and omitting terms involving Lagrange multipliers, the central component of the first order conditions (FoC's) for the constrained maximization obtained from E{U(·)} is It is common practice to replace the terms involving U (·) and U (·) with constants. On dividing through by E U w Y , the FoC's are θξ + ηωδ − Ωw, with θ, η ≥ 0. For given values of θ and η portfolio selection may be performed using quadratic programming. Noting that the FoCs do not depend explicitly on E{Y} or cov{Y}, it is also common practice to rearrange (32) and write it as By varying the values of θ and η different portfolios are generated corresponding to different degrees of risk appetite and preference for skewness, respectively. As shown in [36], this results in a mean-variance-skewness efficient surface, which is a generalization of Markowitz's mean-variance efficient frontier. When η = 0, corresponding to no interest in skewness on the part of the investor, the solution to (33) is the mean-variance efficient frontier. The setup at equation (33) is as described in [227], who summarize a number of related formulations by other authors. There is an important difference between the mean-variance efficient frontier and the mean-variance-skewness efficient surface. For the former, the expected return on a portfolio is an increasing function of portfolio variance, or of the risk appetite θ. When skewness is included, expected return depends inter alia on the scalar quantity Υ = ηµ Dωδ, where µ = E{Y} and D is a symmetric positive semi-definite matrix, which is determined by the optimization process. Depending on the values of µ and δ, Υ may assume both positive and negative values. Thus, in some cases excessive preference for skewness may result in lower portfolio expected return. An example of such a surface is shown in [40].
To illustrate the effect of varying risk aversion and preference for skewness, we describe a simulation loosely based on the returns on the shares of financial services companies traded on the Shanghai stock exchange. Stocks from one of the main Chinese stocks markets are employed for this illustration as it is a view reported in the finance literature that skewness is a feature of emerging markets; see, for example, the paper by [228]. The illustration is based on 30 stocks all which had daily price data from January 2009 to December 2018, giving 2606 observations. Portfolio selection was performed based on Equation (33). It follows standard practice and includes the budget and non-negativity constraints.
The left panel of Figure 4 contains a sketch of the computed surface. The diagram shows that increasing values of θ and η lead a flat surface; that is, the entire budget is invested in the single security with the maximum expected return. For a fixed value of η, the resulting curve resembles the efficient frontier and for η = 0 it is the efficient frontier itself. If the investor has zero preference for expected return, θ = 0, expected return increases with η, but at the maximum value shown in Figure 4, the expected return is less than the maximum achievable. The right panel of Figure 4 shows the number of non-zero holdings. As the plot illustrates, as both θ and η increase the number of non-zero holdings decreases to one. It should be noted that, while this example illustrates the effect of skewness on portfolio selection, a real world application would include other constraints on the weights in order to maintain a degree of diversification. In the interests of brevity, resulting from the use of such design considerations, the parameters used and the simulations are omitted, but they are available on request.

Stochastic Frontier Analysis: An Simple Case
Consider more closely the connection with the SFA theme mentioned in Section 5.1. As its general target, SFA involves the study of the relationship between the output from a group of production units to a number of input factors for these units, with the inclusion of an ingredient which represents the inefficiency of the units with respect to the optimal level allowed by the current technological development. In a simple and classical setting, this situation can be expressed as where Y i is a random variable representing the output of the ith production unit, typically expressed on the logarithmic scale, x i is a p-dimensional vector of input levels for that unit (again, usually expressed on the log scale), and V i , U i are independent random variables, where V i represents a purely random error and U i is a positive random variable, representing the inefficiency level of the ith unit. Classical specifications for U, V, where we now drop the subscript i for simplicity, are as follows: where χ 1 denotes the square root of a χ 2 1 random variable, also called half-normal distribution. The distribution of R = V−U under specification (35) is of the SN type, taking into account its additive representation (5). Simple algebra shows that the SN parameters are related to those in (35) by ω 2 = σ 2 u + σ 2 v , α = −σ u /σ v . The regression parameters β maintain their meaning across the two settings. The connection of this simple SFA formulation with the SN family makes available extensions of the SN distributions developed in the context of symmetry-modulated distributions.
To illustrate this point in a simple case, we make use of some data, namely the data set milkProd available within the R package Benchmarking; see [229]. The data set refers to the diary industry; the production units are Danish milk producers. For each of n = 108 units, four variables are available: milk (the amount produced, in kg), energy (energy expenses), vet (veterinary expenses), cows (number of cows).
A model of type (34) is fitted to the data, setting Y i equal to the log-amount of milk produced and the p = 3 input values x i equal to the log-transformed values of the other variables. As for the (U, V) variables, we start with the distributional assumptions (35). The data fitting outcome is the same using either the FRONTIER program, very popular in the SFA community, which is described in the standard reference text [143], or the R package sn [128], after suitable conversion of the parameters. The fitted model is then Y = 7.521 + 0.1214 log(energy) + 0.06276 log(vet) + 0.8791 log(cows) + R where R ∼ SN(0, 0.21376 2 , −3.5979).
The left panel of Figure 5 displays the scatter plot of the fitted values versus the observed milk log-production. It is visible that, for any given abscissa, the points are more elongated above the identity line than below it, hence indicating asymmetry of the distribution, which in turn indicates the presence of the inefficiency component U i in (34). This feature is confirmed in the right panel of Figure 5, where the continuous blue line represents the fitted SN density of the R variable, clearly asymmetric. Closer inspection of the residuals valuesR i indicates somewhat longer tails than those expected under assumption (35). Various proposals for handling this sort of situation have been put forward in the SFA literature, but their discussion would require considerable space, and it would lead us away from our main theme. We confine ourselves to a brief sketch of a route stemming from the SEC class, specifically the ST distribution (26).
Besides the genesis via a conditioning argument presented in Section 3.2, the ST distribution can also be constructed via an additive mechanism analogous to (5)-(6) for the SN distribution; see Proposition 9 of [19] for a full specification. Using this result in the bivariate case, we can replace the assumption (35) by one where (U, V) is constructed from (U * , V) with bivariate Student's t distribution and U = |U * |. In this case, the variables (U * , V) will be uncorrelated, but not independent, since the multivariate Student's t does not allow independence of the components. The corresponding variable R = V−U will have a ST distribution. For more details about this formulation, see Tancredi [230]. For the data under consideration, we have fitted a ST distribution with ν = 5 degrees of freedom. This value of ν was selected by simple inspection of the histogram of the earlier residuals, without using a formal estimation procedure, but subsequent numerical inspection of the log-likelihood function has confirmed the appropriateness of the choice. After maximization with respect to the other parameters we obtained a maximized log-likelihood value 68.99 on 6 degrees of freedom, having regarded ν as a fixed value. The corresponding ST density is represented by the dashed magenta line in the right plot of Figure 5. The log-likelihood value is 1.165 higher than the corresponding value for the SN fit. While a formal test procedure is not possible because the models are not-nested, there is a non-negligible improvement in the ST model, especially in the light of the limited number of observations. A more formal procedure would be to consider an information criterion, such as Akaike's AIC or similar; since the two models have the same number of parameters, we end up comparing the maximized log-likelihood functions, as already done.

Final Comments
Some sort of final remarks seem appropriate. We refrain from undertaking a hazardous forecasting exercise on the future evolution of the area and, instead, confine ourselves to the less impressive, but still useful, attempt of identifying the key aspects of the past evolution and of the present situation.
It is reasonable to say that the take-off of the domain occurred at the end of the past century, between 1996 and 1999. While before that time only a few sporadic publications existed, after that time, papers started appearing in rapid succession, involving a gradually growing number of authors. For the subsequent 15 years or so, the focus of interest has been primarily on theoretical aspects, in two main directions: (i) formulation of more and more general families of distributions and (ii) development of statistical procedures for fitting these distributions, at least for the more realistic and tractable families.
In parallel, some applications of the newly proposed distributions have started appearing. In the initial stage, they naturally took the form of simple data fitting, often in a univariate context. However, in the last decade or so, we have witnessed an evolution of these applications which have evolved into more elaborate constructions, which often build on the formal properties of the distributions to link them with subject-matter considerations and theory. This is not to say that the theoretical developments have stopped. New contributions still appear, although not at the tumultuous rate of earlier years, which after all may be not detrimental.
The overall perception emerging from this picture is the one of a domain which has entered its maturity stage. The theoretical developments of the early years have lead to applications, from the basic to the more elaborate ones, spanning a wide range of applied areas. This interaction of theoretical and applied work is culturally appealing and generally gratifying for all those that have contributed, and still contribute, to this project.