Properties and Limiting Forms of the Multivariate Extended Skew-Normal and Skew-Student Distributions

This paper is concerned with the multivariate extended skew-normal [MESN] and multivariate extended skew-Student [MEST] distributions, that is, distributions in which the location parameters of the underlying truncated distributions are not zero. The extra parameter leads to greater variability in the moments and critical values, thus providing greater flexibility for empirical work. It is reported in this paper that various theoretical properties of the extended distributions, notably the limiting forms as the magnitude of the extension parameter, denoted τ in this paper, increases without limit. In particular, it is shown that as τ → −∞, the limiting forms of the MESN and MEST distributions are different. The effect of the difference is exemplified by a study of stock market crashes. A second example is a short study of the extent to which the extended skew-normal distribution can be approximated by the skew-Student.


Introduction
The skew-normal distribution was introduced in [1] and the skew-Student in [2]. These two distributions share the property that they may be derived formally. There are several methods of derivation of which probably the best known is to consider the bivariate normal distribution of X and Y each with zero mean, unit variance, and correlation ρ. The skew-normal distribution then arises by then considering the distribution of X conditional on Y < 0 [or Y > 0]. A second method of construction is to consider a random variable X + λU where U has a standard normal distribution truncated from below at zero, written U ∼ TN(0, 1; 0) + , where TN µ, σ 2 ; x + denotes a normally distributed variable with mean µ and standard deviation σ truncated from below at x and λ ∈ R. There are similar and equally well-known constructions for the skew-Student and for the multivariate versions of these distributions. That the conditioning variable Y is required to be less than (greater than) zero and that U follows a standard normal distribution truncated from below at zero are, however, limitations. This is for four principal reasons. First, using negative (positive) values of Y to determine whether or not X is observed is self-evidently a limitation. Depending on the application, the appropriate threshold or truncation point for Y might take any nonzero value, as might the value of the mean of its underlying normal distribution. For example, in his recent paper [3] refers to early work by [4]. The latter was concerned with the scores from admission examinations: in such a case, the mean of Y would surely be greater than zero, as would the truncation point. Similarly, there is often no reason a priori for the underlying mean of U to be zero. Second, empirical evidence reported in the financial economics literature suggests that in the absence of truncation from below at zero, the distribution of the unobserved Stats 2022, 5 variable U, denoted N(τ, 1) in this paper, exhibits nonzero values of τ (see, for example, [5] or [6]). In the first method of derivation above, the corresponding conditioning event would that Y < τ (Y > τ). Such distributions are referred to in the literature (see [7,8]), as extended skew-normal or extended skew-Student. The importance of nonzero values of τ also arises in stochastic frontier analysis, commonly referred to as SFA. SFA models are used to measure the efficiency of manufacturing companies and organizations such as banks. There is a detailed review of SFA models and methods in [9]. In its basic form, SFA employs linear regression models in which the unobserved residual has two components, commonly written as − ν. The first term, , is a standard N 0, σ 2 variate. The second term, ν, is a non-negative variate assumed to have an N 0, σ 2 ν distribution, which is truncated from below at zero; that is, a half normal distribution. The expected value of ν, which is nonzero, measures inefficiency. With these assumptions, the residual − ν has a skew-normal distribution. A somewhat different model was introduced by [10] in which the half normal variable is replaced one which has an exponential distribution. The paper by [11] shows that under the limit as τ → −∞, and with suitable choice of other model parameters, the extended skew-normal distribution encompasses both the half-normal and exponential distributions for the inefficiency term ν. Use of the extended version of the distribution offers greater flexibility in modeling inefficiency: the distribution of the inefficiency variable may exhibit a nonzero mode or may decay steeply.
Nonzero and negative values τ also arise in the study of stock market crashes. The standard model in financial economics is that returns on risky financial assets follow a multivariate normal distribution. Under this assumption, formally, the basic model of portfolio theory is to consider the conditional distribution of asset returns given a specified return on a market index. The resulting conditional distribution is multivariate normal, leading in essence to regression models for the return on individual assets. In the same manner, a market crash may be studied by considering the distribution of asset returns given that the return on the market index is less than a specified negative value. The resulting distribution is multivariate extended skew-normal. For market crashes, the value of the parameter denoted τ is both negative and of substantial magnitude. Analogous results arise if it is assumed that returns follow a multivariate Student distribution. In both the normal and Student cases, the distributions that arise as τ → −∞ are of interest, one reason being that the limiting properties are different.
Third, use of extended versions of the skew-normal or skew-Student gives greater variability in the moments and critical values of the distributions. For empirical applications, this offers the possibility of better model fit. For some applications, the implied flexibility in the formal foundations may offer insights into the underlying data generation process. Last, in the multivariate case, conditional distributions are always in general of the extended type. Thus, for applications where conditional distributions play a role, extended versions are important if not unavoidable. The formal derivation of a skew-normal regression model as in [5] offers an example of this.
Extended versions of the skew-normal and skew-Student distributions have explicit advantages for some purposes. They offer the potential for greater flexibility in empirical work and, in addition, methodological advantages in some cases. The main aim of this paper is to present properties of the multivariate extended skew-normal (MESN) and multivariate extended skew-Student (MEST) distributions. The results demonstrate the differences from the standard versions. The paper also studies limiting cases of the distributions as the magnitude of the extension parameter τ increases without limit, extending a result reported in [11]. As the paper shows, these limiting cases are of interest from a theoretical point of view and offer insights for some applications.
The methodological results are illustrated by two applications. First, there is a study of the effect of a stock market crash. The results are different depending on whether the underlying distributions are multivariate normal or Student. The study presented here is theoretical, but its results can inform the development of econometric models of stock returns. Second, some researchers in this area of statistics have suggested informally that the skew-Student could be used as an alternative to the extended skew-normal. For a specified univariate application, it would be straightforward to estimate the parameters of both distributions and then make an informed choice using a test of fit or, for example, consideration of the tails of the distribution. Such an alternative may be attractive, but the suggestion could equally well be made in reverse: the extended skew-normal could be an alternative to the skew-Student. A general investigation of the similarity of the two distributions, particularly for multivariate cases, would be a major task and beyond the scope of this paper. To inform further research into this issue, this paper contains a short study designed to investigate this conjecture.
The structure of this paper is as follows. In Sections 2 and 3, results for the MESN and MEST distributions, respectively, are presented. The results in these two sections are based on the extended versions of the second method of construction referred to above. Section 4 is concerned with the first method of construction, sometimes referred to in the literature as a hidden truncation model. This section contains the illustrative example of the effect of a stock market crash. The example shows that different behavior arises depending on the choice of model. Section 5 describes a brief investigation into the use of the skew-Student as an alternative to the extended skew-normal. Section 6 offers some concluding remarks. The abbreviations (E)SN and (E)ST are used for the univariate (extended) skewnormal and (extended) skew-Student distributions, respectively, with MSN and MST for the multivariate versions. Examples and graphs are based on univariate distributions, with most numerical results rounded to four decimal places. Notation not defined explicitly in the text is that in common use.

Multivariate Extended Skew-Normal Distribution
The multivariate skew-normal distribution was introduced by [12]. The multivariate extended skew-normal distribution, MESN, with an additional parameter, was first described in [13], independently by [8,14]. Following the notation in the third of these papers, the distribution of an n-vector X that follows this distribution is denoted MESN n (µ, Σ, λ, τ). The authors of reference [13] derive the MESN distribution as a hidden truncation model. The authors of reference [8] present a direct derivation and link it to results in [7], who show that conditional distributions are in general of the extended type. The authors of reference [14] derive it as the convolution X = U + λV, where the random vector U has the multivariate normal distribution N n (µ, Σ) and the scalar random variable V is independently normally distributed as N(τ, 1) truncated from below at 0, denoted V ∼ TN(τ, 1; 0) + . The basic properties of the MESN distribution are described in this section using the notation in [14]. The probability density function of the distribution of X is where φ n (x, µ, Σ) is the probability density function of an n-vector X, which has a multivariate normal distribution with mean vector µ and covariance matrix Σ evaluated at x. Φ(z) is the standard normal distribution function evaluated at z, with φ(z) denoting the corresponding density function. The distribution is denoted X ∼ MESN n (µ, Σ, λ, τ). The moment generating function of X is The mean vector and covariance matrix of the MESN distribution are, respectively where the function ξ k (z) is defined as Note that the covariance matrix may also be written a form that is referred to in Section 3.3. Coskewness and cokurtosis, defined here as the 4th cumulant, are given by respectively. For the skew-normal distribution itself, the mean of the underlying truncated normal variable denoted V equals √ 2/π. Rounded to four decimal places, this value is shown in panel 2, column 1 of Table 1 in the row named "mean". When |τ| ≤ 1 the minimum and maximum values of the mean are 0.5251 and 1.2876, respectively, as shown in panels 1 and 3. The corresponding results for the higher moments are shown in in the other rows of column 1 of the table. Columns 2 and 3 shown the analogous results when |τ| ≤ 5 and ≤30, respectively. Thus, as well as arising automatically under conditioning, the extended version of the skew-normal provides for more flexibility in the moments of the distribution.
An implication of this result, described in more detail below, is that as τ < 0 increases in magnitude, V has an exponential distribution with parameter 1/|τ|, that is, with mean and standard deviation both equal to 1/|τ|. As τ → ∞, the distribution of X tends to a multivariate normal with an unbounded mean vector µ + λτ but a finite covariance matrix Σ + λλ T .
The remainder of this section of the paper presents a number of properties of the MESN distribution. Figure 1 shows two sets of examples of the density function of the (univariate) extended skew-normal distribution for τ = ±30, ±15, ±5, ±2.5, and 0. In the left-hand set, the nonzero values of the extension parameter τ are negative. In the right-hand set, the signs of the τ are reversed. In both sets, µ = 0, σ = 2, and λ = 5. Both sets demonstrate that asymmetry disappears progressively as |τ| increases and exhibit the properties reported in Lemma 1 and the text that follows it.
The figures show two sets of extended skew-normal density functions. In both sets  = 0, = 2 and = 5. In the left hand set values of the extension parameter  are set to -30, -15, -5, -2.5 and 0. In the right hand set the signs of the  are reversed.  Papers by [7,15] show that a suitable linear transformation reduces the MSN distribution to a canonical form. Corresponding representations may be derived for the extended version of the distribution and, as shown below, for the extended skew-Student. These representations depend on the following standard result. Lemma 2. Let I n denote an n × n unit matrix, ψ an n-vector and 0 n an n-vector of zeros. The eigenvalues of the matrix I n + ψψ T are (i) 1 + ψ T ψ and (ii) 1 repeated n − 1 times. The corresponding eigenvectors are (i) ψ/ ψ T ψ and (ii) an n × (n − 1) orthogonal matrix T 0 which satisfies ψ T T 0 = 0 T n−1 .
Note that from Lemma 1, as τ → −∞, the limiting distribution of Z 0 is the standard normal.

The Truncated Normal Distribution and Its Approximations
The probability density function of the distribution of the truncated normal variable The moment-generating function (MGF), originally reported in [16], is with the MGF valid for all t ∈ R. Following on from [17], numerous authors present results for the moments of the truncated normal distribution and generalizations thereof. These include [18][19][20][21][22] and, recently, [23], among others. For values of τ that are less than zero, the asymptotic expansion of Φ(τ) from page 932 of [24] is Noting that with suitable choices of m and values of τ, the remainder term R m (.) may be ignored. In this case, the moment-generating function of V is This leads to a distribution for which the corresponding density function is a weighted average of gamma densities where g(.) denotes the density function of the gamma distribution For sufficiently large values of |τ|, terms after the first may be ignored, giving an exponential distribution with density function When used to to form the convolution X = U + λV, the distribution at (15) leads to the skew-normal exponential distribution described in [11] but originally due to [10]. Figure 2 shows sketches of the truncated normal density function for τ = −3, −10 and −30. The steepness of decay increases with |τ|. Figure 3 shows the truncated normal density function for τ = −3, together with the corresponding exponential density function and and approximation based on the density at Equation (13) with m = 2. As Figure 3 indicates, the three density functions are visually similar. In particular, there is little difference between the truncated normal density and the three term mixture based on Equation (13).

Moments of the Truncated Normal Distribution
Expressions for moments of the truncated normal distribution are reported in [21], as well as in references cited above in Section 2.1. In the notation of the present paper, from Equation (9), the mean and variance of the truncated normal distribution are, respectively, Skewness and kurtosis, defined here as the fourth cumulant, are respectively and Kurtosis, the fourth moment about the mean and denoted byκ 4 , is Expressed in terms of ξ 1 (τ), this is Note that, from [25], κ 3 ≥ 0 for all τ ∈ R. Using the first term of the asymptotic expansion for Φ(τ) for τ 0, under which V has the exponential distribution at Equation (15), leads to the following expressions for the first four derivatives of logΦ(τ).
where in this paper the notation is taken to mean that the ratio of the two functions tends to unity as, in this case, τ → −∞. These results give the same expressions for the first four moments as those computed from the exponential distribution at Equation (15). Table 2 shows the computed values of the first four moments of the truncated normal distribution, the limiting exponential distribution at Equation (15), and the mixture distribution based on Equation (13) with m = 2. Values are shown for τ = −3, −10, and −30. In the table, kurtosis is the fourth moment about the mean, that is,κ 4 . As the table shows, the differences between the exact and approximate results are small and decline as |τ| increases. Whether a given approximation may be used as a practical alternative to the truncated normal will depend on the magnitude of τ and the application in question. Kurtosis is the fourth moment about the mean. The abbreviations 'Exact', 'Exp-1', and 'Exp-3' are as described in Figure 3.

Standardized Form of the Extended Skew-Normal Distribution
Additional insights into the MESN distribution may be obtained by standardization. If Ω 1/2 denotes a left square root matrix of Ω, the random n-vector Z now defined as satisfies E(Z) = 0 n and cov(Z) = I n . The distribution of Z has the density function where φ n (.) and ω are as defined for Equation (1) and For the standardized form of extended skew-normal distribution, coskewness and cokurtosis (also defined here in terms of the fourth cumulant) are given bỹ respectively, whereλ i is the standardized value of the skewness or shape parameter defined asλ Both coskewness and cokurtosis tend to zero as τ → −∞, in which case the limiting distribution of Z is the standard multivariate normal. A suitable transformation similar to that in Proposition 1 shows that the standardized MESN distribution may be expressed in canonical form similar once again to that described in [7] and [15].
where Ω 1/2 is a left square-root matrix of Ω and let and β 1 as defined at Equation (16) and ψ in Proposition 1. Note that as in Proposition 1 as τ → −∞ the limiting distribution of Z 0 is the standard normal. Figure 4 shows two sets of standardized extended skew-normal density functions. In both sets µ = 0, σ = 1 and λ = 5. In the left-hand set, values of the extension parameter τ are set to −30, −15, −5, −2.5 and 0. In the right-hand set, the signs of the τ are reversed. Both sets of densities illustrate that for τ = 0 little asymmetry is apparent even when the shape parameter λ is substantial; in this case five times greater than the scale parameter σ. Of the values of τ shown in the figure, only τ = 0 leads to a density function with a discernible amount of asymmetry. Figure 5 shows two more sets of the skew-normal density functions. The panel on the left shows extended skew-normal density functions with µ = 0, σ = 1 and λ = 5. The values of τ are −5, −2.5, −1, 0, 1, 2.5 and 5. The panel on the right-hand side consists of the corresponding densities standardized to have mean equal to zero and variance equal to one. The X-scales are the same in each panel. As Figure 5 shows, the skewness apparent for the extended skew-normal distributions reduces and largely disappears under standardization. There are analogous results for negative values of λ.
The figures show two sets of the standardised extended skew-normal density functions. In both sets  = 0, = 1 and = 5. In the left hand set values of the extension parameter  are set to -30, -15, -5, -2.5 and 0. In the right hand set the signs of the  are reversed.   The figures show two sets of the skew-normal density functions. The panel on the right hand side consists of densities that are standardised to have mean equal to zero and variance equal to one. In both sets  = 0, = 1 and = 5. In each panel the shape or skewness parameters denoted  are set to -5, -2.5, -1, 0, 1, 2.5 and 5. The X-scales are the same in each panel.  Table 3 shows a selection of moments for the extended skew-normal distribution and the corresponding standardized form for values of τ that are less than or equal to zero. Values of τ and λ are as shown in the table. Values of the location and scale parameter are µ = 0 and σ = 1 and are used in all numerical results. As the table shows, when τ ≤ −10 the values of standardized skewness and kurtosis are numerically close to 0 and 3 respectively, thus supporting the result of Lemma 1. For λ = 0 and 1 there is evidence to support normality for τ < −1. Asymmetry is most evident when τ is zero or close to it. Table 4 shows the corresponding selection for positive values of τ. The panel corresponding to τ = 0 is repeated for ease of reading. The table indicates normality for τ ≥ 5. Asymmetry is evident when τ ≤ 1. The panel with τ = 2.5 has values of κ 4 that are negative.    Mean, variance, skewness, kurtosis (κ 4 ), and fourth cumulant (κ 4 ) are denoted mn, vr, sk, ku, and k4, respectively. ssk and sku denote skewness and kurtosis for the standardized distributions.

Multivariate Extended Skew-Student Distribution
The multivariate extended skew-Student distribution, MEST], is an extension of the multivariate skew-Student distribution originally introduced by [2]. The extended version is reported in [26] and later in both [27,28]. Following [14], the former derives it as the convolution X = U + λV, where the random vector U T , V of length n + 1 has a multivariate Student distribution with location parameter vector µ T , τ and scale matrix with V truncated from below at zero. Consistent with the notation in Section 1, this location parameter µ and scale σ truncated from below at x. The marginal distribution of U has the symmetric density function reported Section 3.2 of [27] and independently in [28]. The probability density function of the distribution of X is where and where ω is as defined at Equation (2). t ν,n (x, µ, Σ) is the probability density function of an n-vector X which has a multivariate Student distribution with location parameter vector µ and scale matrix Σ evaluated at x. T ν (z) is the distribution function of a Student's t variable with ν degrees of freedom evaluated at z and t ν (z) is the corresponding density function. This distribution is denoted X ∼ MEST n (µ, Σ, λ, τ; ν). As in Section 2, this section of the paper presents basic properties of the MEST distribution. Similar to Table 1, unreported results show that nonzero values of τ make a substantial difference to the moments of the distribution. As τ → ∞, the limiting distribution of X is multivariate Student.
The proof of this result uses the scale mixture representation reported in Lemma 3 of [29]. This result is consistent with the analogous property of the MESN distribution reported in Section 2. As shown later in this section, however, the limiting distribution of X as τ → −∞ in the MEST case is different from that for the MESN. Figure 6 shows sketches of the extended skew-Student density function for λ = 0 and ν = 3. The left-hand panel shows density functions with negative values of τ ranging from −30 to −1.The right-hand side shows densities with positive values of τ ranging from 0 to 30. This symmetric density function is that reported in both [27,28]. Two notable features are, first, the similarity of the density function for increasing positive values of τ, but, second, the increasing spread of the density function as |τ| increases for negative values of τ. For λ = 5, the left-hand panel of Figure 7 shows density functions with the same negative values of τ. The right-hand panel shows densities with τ ranging from 0 to 20. In both of these figures, µ = 0 and σ = 1. In the right-hand panel of Figure 7, the density function is qualitatively similar to the corresponding skew-normal distribution: asymmetry disappears with increasing values of τ, and the location parameter increases, but the spread does not. For negative values of τ, the spread increases and asymmetry decreases with increasing values of |τ|. To support the sketches in the figures, the moments of the extended skew-Student distribution are reported in Section 3.3 below.
A canonical form of the MEST distribution may be derived using an approach that is essentially the same as that in Proposition 1.
Standard manipulations show that Z 0 ∼ EST(0, 1, ψ 0 , τ; ν) and that the marginal distribution of Z has the symmetric Student-like density function reported in Section 3.2 of [27].
The figures show sketches of the (symmetric) extended skew-Student density function for  = 0 and = 3. The left hand panel shows density functions with negative values of  ranging from -30 to -1.The right hand side shows densities with positive values of  ranging from 0 to 20. In both sets  = 0 and = 1.

The Truncated Student's t Distribution
Similar to the extended skew-normal, the properties of the extended skew-Student distribution are substantially affected by those of the truncated form of Student's t. The density function of the truncated Student's t variable v is Figure 8 shows sketches of the truncated Student t density function for τ = −35, together with two approximating beta type-2 density functions as described below in Lemma 5. The degrees of freedom ν are 5 and 20, respectively. For a fixed value of τ, the figure illustrates the increasing severity of decay as ν increases. It is notable that for ν ≥ 5, the truncated Student t is well approximated by the beta type-2 densities.
The figures show sketches of the truncated Student t density function for  = -35, together with two approximating beta-type 2 density functions as described in the text.
In the left-hand [right-hand] panel the degrees of freedom equals 5 [20]. Note that the X-scales are not the same in each panel.

Moments of the Truncated Student's t Distribution
Moments of the truncated distribution at Equation (30) may be evaluated directly. Note that expressions for the moments of a doubly truncated t distribution may be found in [30]. As reported in [27], for ν > 1 and ν > 2, respectively, the mean and variance of this distribution are where The following result, derived using integration by parts, leads to a more useful representation of η ν (τ).

Lemma 3.
For ν > 2, the following result holds Using this result, for ν > 2, the functions η ν (τ) and ξ ν (τ) are related by the identity Equation (33) allows the variance to be written as Note that lim ν→∞ ξ ν (τ) = ξ 1 (τ) is sufficient to show that the limiting values in Equation (31) equal those for the truncated normal at Equation (16). For ν > 3 and ν > 4, skewess and kurtosis (the fourth moment about the mean), respectively, are where and As already noted above, reference [25] showed that the skewness of the truncated normal distribution is non-negative for all values of τ. The following shows that the same result holds for the truncated Student distribution.
The proof is by contradiction. First, note that since ξ ν (τ) ≥ 0, the sign of κ 3 (V) is determined by the sign of the expression in {.} in Equation (35). This quadratic function of ξ ν (τ) has roots

.
Since the coefficient ξ ν (τ) 2 is positive, the function is negative between the roots, which is a contradiction.
Note that as ν → ∞, Proposition 5 also establishes Sampford's result, and note that the expressions for the first four moments tend to those for the truncated normal distribution at Equations (16), (17), and (19).
Computation of limiting expressions for the moments as τ → −∞ requires a result that is analogous to the well-known asymptotic expression for normal distribution reported in [24]. Such a result was first reported in [31]. As it does not appear to be well known, it is summarized below in the notation of this paper.

Lemma 4 ([31]
). For values of τ that are less than zero, the asymptotic expansion of T ν (τ) is Noting that with suitable choices of m and values of τ, the remainder term R m (.) may be ignored.
Using the first two terms in the expansion in Lemma 4 for τ < 0 and ν > 1 gives from which the asymptotic expected value is For ν > 2, the corresponding expression for the asymptotic variance is Thus, for fixed finite degrees of freedom ν > 2, the expected value and variance increase without limit as τ → −∞. As ν → ∞, the expected value and variance tend to 1/|τ| and 1/|τ| 2 , respectively, the results for the truncated normal distribution. The corresponding expressions for skewness and kurtosis are omitted in view of their complexity. However, if just the terms proportional to |τ| 3 are considered, then for ν > 3 as τ → −∞, asymptotic skewness is Similarly for ν > 4, asymptotic kurtosis is proportional to |τ| 4 . Table 5 shows a selection of moments from the truncated Student's t distribution. As τ increases above zero, the distribution increasingly resembles Student's t as demonstrated by the values in the bottom panel of the table. The top panel corresponding to τ = −35 shows the increasing values of the moments. The analog of the limiting exponential distribution that arises in the normal case described in Section 2.1 is as follows.
The proof of this lemma is in Appendix A. An asymptotically equivalent result is that the variableỸ = V/ ν + |τ| 2 is also distributed as β I I (1, ν).
It is straightforward to show that the conditional distribution of X given V = v follows a multivariate Student distribution with ν + 1 degrees of freedom, location parameter vector µ + λv, and scale matrix Use of this distribution in conjunction with the asymptotic distribution of V in Equation (45), for τ < 0 does not lead to tractable results that are analogous to those in Section 2.

Moments of the MEST Distribution
For ν > 1 and ν > 2, respectively, the mean vector and covariance matrix of the MEST distribution are and Using the identity at Equation (33) allows the covariance matrix to be written as The similarity of the coeffcient of λλ T to the corresponding term in Equation (6) may be noted. The coefficient of Σ provides the inequality ν − τξ ν (τ) ≥ 0.
The skewness of a single variable X i in X with scale denoted by σ may be expressed in terms of the moments of V the truncated Student's t variable, specifically Equations (34) and (35), and is given by Defining the constants The kurtosis of X i is given bȳ The corresponding expressions for coskewness and cokurtosis are omitted. A selection of moments of the extended skew-Student is shown in Tables 6-9. Table 6 [7] shows results for τ ≤ 0 [τ ≥ 0] for λ = 0. The panel for τ = 0 is repeated for convenience and corresponds to Student's t distribution. The lower panels of Table 6 show the increasing magnitude of variance and kurtosis as |τ| increases, even for λ = 0. Tables 8 and 9 show the corresponding results for λ = 5. Note that in Table 8, some large results are shown to two decimal places only to preserve the formatting.

Standardized Forms of the MEST Distribution
As in Section 2.3, further insights into the extended skew-Student distribution may be obtained by standardization. If Ω 1/2 ν denotes a left square root matrix of Ω ν , the random vector Z now defined as satisfies E(Z) = 0 n and cov(Z) = I n . The distribution of Z has the density function where t ν,n (.) is as defined for Equation (28), ω is as defined for Equation (1) and The distribution at Equation (54) has a canonical form. First, define partition Z into a scalar Z 0 and an (n − 1)-vector Z 1 and let Q(Z) be the quadratic form where ψ is as defined in Proposition 1. Methods similar to those used in that proposition gives the following result.
Proposition 6. Let X ∼ MEST n (µ, Σ, λ, τ; ν) and The density function of Z is where As Equations (58) and (59) show, under the canonical representation, the asymmetry in the density function is attributable solely to the scalar variableZ 0 . The marginal distribution of Z 1 is symmetric and of the same type reported Section 3.2 of [27]. Examples of the EST and standardized EST density functions are shown in Figure 9 for τ = −30 and −5 and ν = 10, 20, and 100. In the upper (lower) row, λ = 0 [5]. The X-scales are the same in each panel. The graphs confirm results from Tables 8 and 9, namely that the degree of asymmetry is reduced under standardization. Examples of contour plots for the bivariate EST and standardized EST distributions are shown in Figure 10.
To investigate the behavior of the distribution as τ → −∞ for fixed ν, consider the scalar variable Z 0 , which has the marginal distribution EST µ ν , As τ → −∞ for fixed ν, the asymptotic density function of Z 0 is where This leads to the following result: For ν > 2, as τ/ √ ν → −∞, the distribution of Z 0 has the asymptotic density function with the sign of ψ 0 determined by the sign of z 0 + A, and The result in this proposition requires the asymptotic expression for the distribution function of Student's t. As noted above, such a result was first provided by [31] and is summarized in Lemma 4. Comparative examples of the exact and asymptotic EST density functions are shown in Figure 11. The implication of Proposition 7 is that as τ → −∞, the standardized distribution is qualitatively similar to the corresponding form for the extended skew-normal in that dependence on τ disappears. For nonzero values of λ or ψ 0 , however, the distribution remains asymmetric. It is important to note though that, unlike the MESN, dependence on τ as it tends to −∞ does not disappear in the nonstandardized MEST case. In addition to Proposition 7, recall from results in Sections 3.

Hidden Truncation Models
In their simple form, hidden truncation models are concerned with the bivariate normal distribution of (X, Y) in situations in which X is observed if Y is greater than (less than) a given threshold, here denotedτ. The procedure is commonly referred to as selective sampling. The resulting conditional distribution is that of X|Y ≥ (≤)τ. Such a construction is reported in a more general form in [12] for the case in which the scalar X is replaced by a random vector X. The phrase hidden truncation models is more often associated with the [13] in which they refer to an earlier work [32]. In selective sampling situations, it seems self-evident that the thresholdτ will depend on the application in question. This is clearly implied in Section 2 of [13] in which they denote the threshold by α and report the resulting distribution of X conditional on Y ≥ α, which is the extended skew-normal. The extended version of the skew-normal is also described in [33]. In the introduction to a sole-authored later paper, [34], Y is assumed to exceed its expected value. This case is more in keeping with the skew-normal literature, which does not generally employ the extended version of the distribution. Subsequent sections of [34], however, are inter alia concerned with extended versions of the skew-normal and other distributions.
The aim of this section is to present limiting forms of the extended skew-normal and skew-Student distributions when they are derived as hidden truncation models. Consistent with the results in Sections 2 and 3, the limiting distributions exhibit different properties. The distributions of the hidden truncated variable Y and the observed vector X differ markedly depending on whether the underlying form is normal or Student's t. In selective sampling, limiting forms of the distributions arise when the notional observation on the conditioning variable Y is required to be in one of the tails of its distribution. To illustrate the differences between the hidden truncation skew-normal and skew-Student distributions, either extended or not, this section contains a table of critical values corresponding to a probability of 0.025. Critical values corresponding to other probabilities are available on request. In addition to these general results, Section 4.4 describes an application to stock market crashes, in which the truncated variable is not only material to the resulting distribution but is also observed.

Hidden Truncation Under The Normal Distribution
It is assumed that the n-vector X and a scalar variable denoted Y have a multivariate normal distribution The conditional distribution of X, given that Y ≤τ, has the probability density function where The moment-generating function of the conditional distribution of X given Y ≤τ is and that of Y is given Y ≤τ Noting the similarity to the MGF of the truncated variable denoted V in Section 2.1, it follows that Asτ → −∞, the variable Y given that it is less than or equal toτ becomes deterministic in the sense that its expected value is asymptotically equal toτ, but its variance and all higher moments are asymptotically equal to zero. The conditional expected return and covariance matrix of X are, respectively, Asτ → −∞, the vector of expected values and the covariance matrix become It is interesting to note that element i of the vector of expected values decreases or increases depending upon whether δ i is positive or negative. The joint moment-generating function of X and Y conditional on Y ≤τ is from which cov(X, Y|Y ≤τ) = δ{1 + ξ 2 (τ)}.
Using similar arguments to those for Lemma 1, asτ → −∞, the covariances all tend to zero as expected.

Hidden Truncation Under Student's t Distribution
It is now assumed that the n-vector X and a scalar variable Y have a multivariate Student distribution with ν degrees of freedom. The conditional distribution of X, given that Y ≤τ, has the probability density function where ω Y and τ are as defined above and The conditional mean and variance of Y are where ξ ν (τ) and η ν (τ) are defined at Equation (32). Asτ → −∞, the asymptotic expected value and variance are For finite and fixed degrees of freedom, and ignoring µ Y for ease of exposition, the conditional expected value is uplifted through multiplication by ν/(ν − 1), that is, the effect is most pronounced when the degrees of freedom are small. The asymptotic variance increases with |τ| 2 , that is, potentially without limit. The conditional expected return and covariance matrix of X are and where Asτ → −∞, the vector of expected values and the covariance matrix become and That is, for finite degrees of freedom, both expected values and the covariance matrix increase in magnitude without limit asτ → −∞. Similar to Equation (71), the conditional expected value of element i of X will increase without limit if the corresponding value of δ i is negative and is unaffected if it equals zero.
Comparing the normal and Student hidden truncation models, the vectors of expected values are mainly determined by τ. Differences will be marked only if the degrees of freedom are small. The covariance matrices differ substantially: in the Student case for fixed ν, the covariance matrix increases without limit asτ → −∞. For a given finite value of τ << 0, the increase in the elements of the covariance matrix decreases with increasing ν. The conditional covariance between X and Y is Standard manipulations using Equation (83) show that the conditional correlation between a typical element i of X and Y is asymptotically equal to , which tends to zero as ν → ∞. Table 10 shows critical values corresponding to a probability of 0.025 for the univariate versions of distributions at Equations (66) and (75) for a range of values of τ, ρ, and ν. Table entries are computed numerically, displayed to two decimal places. In Panel 4, corresponding to the standard case τ = 0, the first row, ρ = 0 yields the critical values for Student's t distribution with 5, 10, 20, 50, and 100 degrees of freedom and the standard normal distribution. The other rows in the same panel correspond to ρ = 0.2, 0.4, 0.6, and 0.8. As the panel shows the critical values range from −1.96 to −3.15. In Panels 1 to 3, for which τ takes negative values, the range is greater and increases with the magnitude of τ. In panels 5 to 7, with positive values of τ, the critical values closely approximate those of Student's t and the normal distribution as expected. In each panel, the rows corresponding to ρ = 0 are the critical values of the nonstandard symmetric Student-like distribution reported in both [27,28]. The effect of the distribution of X and Y and the thresholdτ has a non-negligible effect on critical values, that is, for many applications, extended versions of the distributions may be preferred.  The table values correspond to a probability of 0.025 for the hidden truncation models at Equations (66) and (75). Table entries are computed numerically and displayed to two decimal places.

Stock Market Crashes
The basic empirical model for the returns on stocks is a regression in which the single explanatory variable is the contemporaneous return on a suitable market index, such as the UK's FTSE100 or the USA's S&P 500. The model is generally referred to as the market model. It is the operational version of the capital asset pricing model, universally referred to as the CAPM, of [35][36][37]. Numerous other regression setups are in widespread use, but all maintain a close connection to the market model. More formally, it is assumed that the n-vector of asset returns R and the contemporaneous return on the market index R m have a multivariate normal distribution where δ = βσ 2 m . An element R i of R may denote the return on an individual stock or a portfolio of stocks. The market model is then the conditional distribution of R given that R m = r m , that is or, if the market model is written in familiar regression style notation The results with an underlying Student distribution are similar. For ν > 1, the conditional mean is the same, but for ν > 2, the conditional covariance matrix now depends on r m as follows That is, the conditional variance is inflated by a factor that is proportional to the squared deviation of r m from its expected value.
In this subsection, the effect of a market crash is considered. A detailed coverage of the statistical and empirical properties of crashes is beyond the scope of this paper, but some theoretical insights into crashes may be derived using the skew-normal and skew-Student distributions. Specifically, the standard conditioning event R m = r m is changed to R m ≤τ. This characterizes a crash whenτ is both negative and of large magnitude. Comparison of Equation (85) with (65) and (66) shows that the resulting conditional distribution of R is extended skew-normal or extended skew-Student. For underlying normal returns, the conditional mean and variance of market returns are, respectively, where τ = (τ − µ m )/σ m . Similar to the results in Section 4.1, in the limit, asτ → −∞, market return becomes nonstochastic with (expected) value equal toτ. The corresponding results for the conditional mean vector and covariance matrix of asset returns R are and cov(R|R m ≤τ) In a crash, the conditional expected return on asset i decreases or increases without limit depending on the sign of β i , but there is no effect if β i = 0. The conditional covariance matrix is asymptotically equal to cov(R|R m =τ), the conventional case defined at Equation (86). With underlying Student returns, for ν > 2, the conditional mean and variance of market returns are, respectively, Using the results at Equation (78), it follows that the expected value of market return in a crash is negative and increases pro rata to the standardized crash size. Unlike the results based on an underlying normal distribution, the conditional variance is proportional to the square of the standardized crash size; for given ν, the variance increases without limit. A sketch of the conditional distribution of index returns under normal and Student's t distributions with five degrees of freedom and corresponding to a five-standard-deviation crash is shown in Figure 12. As the sketch shows, the Student's t tail is longer and fatter than that of the normal.
The corresponding results for the conditional mean expected return vector is As above, the conditional expected return for asset i will increase or decrease without limit depending on the sign of β i but is unchanged if it equals zero. Using Equation (80), the conditional covariance matrix is which, in keeping with Equation (83), may also increase without limit. Noting that η ν (τ) = E (R m − µ m ) 2 /σ 2 m |R m ≤τ , the similarities between Equation (94)

Extended Skew-Normal versus Skew-Student
The literature concerning the skew-normal and skew-Student distributions is more abundant that that for the corresponding extended versions. It has been conjectured by some researchers in the area, albeit informally, that the skew-Student could be used as an alternative to the extended skew-normal distribution. To some extent, such a suggestion is motivated naturally by the similarities in the shapes of some of the respective density functions. Somewhat more formally, use of the skew-Student could be regarded as being closer in spirit to the original skew-normal literature. For univariate distributions, and from the perspective of empirical work, this is an issue that is more concerned with parameter estimation and tests of fit. That is, for a given data set, does the extended skew-normal or the skew-Student offer better fit? For multivariate distributions, the issue is the same in principle, although the details are more complex. It is of course also the case that the extended skew-normal might be preferred to the skew-Student. For example, for the former, all moments exist, which may be a consideration for some applications. Conditional distributions are in general of the extended type. For multivariate applications in which conditioning is a requirement, methodological issues could imply that extended versions of the distribution are more appropriate. That is, an MESN or even MEST distribution may be preferable to the MST.
To construct an approximation, at least two types of method suggest themselves. Given a specified extended skew-normal distribution, one method would be to minimize a suitable measure of the distance between the two density functions. Several measures of distance could be considered. Denoting the two density functions by f ESN (x) and f ST (x) and assuming that the parameters of the former (latter) are given, the parameters of the latter (former) could be chosen by minimizing Numerous variations on this theme could be constructed, for example, using a different norm or minimizing the divergence between the ESN and ST density functions using the Kullback-Leibler divergence measure [38] or the Hellinger distance ( [39]). A second approach could be to seek to match the first four moments of the two distributions. It is clear that a comprehensive study of this conjecture, particularly bearing in mind multivariate distributions, would be a substantial undertaking. In this section of the paper, an initial investigation into the approximation of the univariate extended skew-normal distribution by the skew-Student, which may inform more comprehensive studies to be carrried out in the future, is described. The section is in two parts. In the first section, a theoretical investigation based on population moments is reported. In the second part, a study in which simulated data from a number of specified extended skew-normal distributions is used to estimate the parameters of both models is reported.
There are three technical points to note. First, the choice of an approximating skew-Student distribution is informed by the limiting forms of the extended skew-normal. From Lemma 1, as τ → −∞, the limiting form of the ESN distribution is N n (µ, Σ), which is the limiting form of a skew-Student distribution with λ = 0, that is, Student's t, as ν → ∞. As also reported in Section 2, a similar result holds as τ → ∞. The implication is that using ST distributions to approximate the ESN is appropriate for values of |τ| that are not too large. Second, motivated again by similarities in the shape of the density function, an ESN distribution may be approximated by the SN itself. Third, there are combinations of the parameters λ and τ for which approximation by moment matching are infeasible. To illustrate this, consider an approximation of a univariate ESN with parameters µ, σ 2 , λ and τ by an SN with parameters µ 0 , σ 2 0 and λ 0 . Equating skewness shows that a real value of the ratio σ 2 0 /λ 2 0 requires that and that simple computations show that the inequality does not always hold.

Moment Matching Study
The study in this paper considers the approximation of an ESN distribution by an ST. As above, for the ESN, µ = 0 and σ 2 = 1. The extension parameter τ takes 11 values in the range [−20, 20]. As skewness is asymmetric in the shape parameter; λ takes 9 values in the range [−10, 0]. For practical reasons, the derived value of ν is restricted to be an integer. For a given pair (λ, τ), the approximating values of ν and λ 0 are derived by minimizing the absolute difference in standardized skewness. This is done by grid search. The other parameters are computed by equating the expected value and variance of the two distributions. For (λ, τ), pairs for which a moment matching approximation exists, the divergence between the ESN and ST density functions is computed using the Kullback-Leibler divergence measure [38]. The values of this divergence measure are ranked from best to worst, with the parameters corresponding to the best ten and worst ten shown in Table 11. The first two columns of each panel show the values of λ and τ. The next three columns show the computed values of µ, σ 2 , and λ for the approximating ST distribution, with values rounded to four decimal places. Computed values of ν that were equal to 1000 or greater were replaced by ∞, that is, the approximating distribution is effectively skew-normal. Table 12 shows the corresponding values of the moments. As the Best 10 panel shows, the differences in the first four moments are negligible. For the Worst 10 panel, differences in mean, variance, and skewness are also negligible because of the method of construction. Unlike the results in the upper panel, there are differences in kurtosis. Table  13 shows the corresponding critical values, displayed in eight columns. These show critical values at p-values of 0.5%, 2.5%, 95.5%, and 99.5% in ESN/ST pairs. Values are shown corrected to two decimal places and were computed numerically. As the table shows, for the Best 10 approximations, the differences are negligible. For the Worst 10, the differences are more pronounced. To illustrate the effect of the moment matching procedure, Figure 13 shows ESN and ST density functions for which the ST approximation is the worst according to the Kullback-Leibler divergence measure.  f(x) x ESN ST Figure 13. Example of an extended skew-normal and approximating skew-Student density functions.
The results in Tables 11-13 provide support to the implications of Equation (95), namely that the method of approximations works well for values of |τ| that are not too large. An interesting result is that for numerous parameter combinations, the extended skew-normal distribution may be well approximated by a skew-normal. The usefulness of the results in the Worst 10 panels will depend on the application. In some applications, accurate critical values are not necessary, but in others, they are. There are other methods of measuring the divergence between two density functions. Two well-known ones are Hellinger distance ( [39]) and Jensen-Shannon divergence ( [40]), both of which constitute topics for future investigation. The values of the Kullback-Leibler divergence measure are ranked from best to worst, with the parameters corresponding to the best ten and worst ten shown in the two panels. The first two columns of each panel show the values of λ and τ. The next three columns show the computed value of µ, σ 2 , and λ for the approximating ST distribution, rounded to four decimal places. Computed values of ν equal to 1000 or greater were replaced by ∞, that is, the approximating distribution is effectively skew-normal.

Simulation Study
The simulation study uses the same sets of values of µ, σ 2 , λ, and τ. For each combination of the parameters, 100 samples of size 100 from an extended skew-normal distribution were drawn. The parameters were estimated by maximum likelihood for the ESN and ST distributions. In addition, motivated by the results in Table 11, the parameters of the skew-normal distribution were also estimated. Summaries of the results are shwon in Tables 14-16. Table 14 shows the value of the log-likelihood function for each parameter combination computed at its estimated maximum, averaged over the 100 samples and over values of τ . The table has   For each parameter combination shown in Tables 15 and 16, the entries are averages of the 100 samples. Table 15 shows the root mean-square error in the moments for the three distributions and for 35 selected combinations of (λ, τ). As the table shows, the lowest root mean-square error occurs under the ESN for 30 of the (λ, τ) combinations. Root mean square error is computed as the square root of the average squared difference between the population moments and the average of the estimated moments based on parameters based on MLE for each distrbution. The population moments included in the calculations are mean, variance, skewness, and kurtosis.  Root mean square error is computed as the square root of the average squared difference between the population moments and the average of the estimated moments based on MLE parameter estimates for each distribution. The population moments included in the calculations are mean, variance, skewness, and kurtosis.

Concluding Remarks
In this paper, results that demonstrate the properties of both the multivariate extended skew-normal and extended skew-Student distributions as the value of the extension parameter τ changes are presented. In general, for given value of location, scale, and shape or skewness, nonzero values of τ lead to greater variability in both the moments and critical values. In turn, this offers greater flexibility in empirical applications of these distributions. From a theoretical perspective, increasing values of |τ| leads to more fundamental changes in both distributions. As τ increases without limit, the asymptotic distributions are multivariate normal and multivariate Student, respectively. The respective vectors of expected values of both distributions are dependent on τ and are unbounded. The covariance matrices, however, remain finite. Skewness disappears for both distributions. By contrast, as τ → −∞, more substantial changes take place in the distributions. Most notable is that for the MESN distribution dependence on τ vanishes, but for the MEST in general, it does not. In the case of the MESN, the limiting distribution is multivariate normal. For the MEST distribution with finite degrees of freedom, asymmetry remains. For fixed τ, the extent of asymmetry decreases as the degrees of freedom increase. For fixed degrees of freedom, as τ → −∞, the vector of expected values and the covariance matrix are both unbounded.
To illustrate the potential of the MESN and MEST distributions, two applications are described. First, the effect of a stock market crash is studied assuming underlying multivariate normal and multivariate Student distributions. A crash, in which the return on a market index is less than a given negative threshold, results in multivariate extended skew-normal and multivariate extended skew-Student distributions. Under an underlying multivariate normal distribution, as the crash size increases without limit, the return on a stock market index becomes nonstochastic. In short, the market plummets: actual return equals expected return. Under an underlying multivariate Student distribution, expected return is broadly the same, but variability increases without limit. The market decline is noisy. There are analogous results for the returns on individual stocks. In particular, with underlying normality, the conditional covariance matrix remains finite, whereas under an underlying Student distribution, it does not. A detailed investigation of the implications and suitability of these models is beyond the scope of this particular paper, but it is reasonable to posit that the results offer support to the view that an underlying Student distribution is a more realistic model than the normal. Given that stock market collapses have in the past been of relatively short duration, the results also imply that for financial applications the models change. The methods described may be applied in principle to stock market booms. It may also be noted that if an inefficiency variable were to be constructed, SFA analysis could be treated in the same way.
Second, the conjecture that the skew-Student could be used instead of the extended skew-normal is an interesting one. Given the similarity in the shapes of the density functions for many combinations of parameters, this conjecture suggests that there is the possibility of flexible model choice. A general investigation of this conjecture would be a substantial task. The exercise reported in this paper is intended to offer evidence to motivate further research. The short study reported in this paper, part theoretical and part based on simulation, suggests that a given ESN distribution should be treated as such. However, the results also suggest that the ESN could be well-approximated by the skew-normal in some circumstances, but in general not by the skew-Student. Acknowledgments: Thanks are due to the reviewers of the paper for comments which have led to both improved presentation and content.

Conflicts of Interest:
The author declares no conflict of interest.