Copulaesque Versions of the Skew-Normal andSkew-Student Distributions

A recent paper presents an extension of the skew-normal distribution which is a copula. Under this model, the standardized marginal distributions are standard normal. The copula itself depends on the familiar skewing construction based on the normal distribution function. This paper is concerned with two topics. First, the paper presents a number of extensions of the skew-normal copula. Notably these include a case in which the standardized marginal distributions are Student’s t, with different degrees of freedom allowed for each margin. In this case the skewing function need not be the distribution function for Student’s t, but can depend on certain of the special functions. Secondly, several multivariate versions of the skew-normal copula model are presented. The paper contains several illustrative examples.


Introduction
The recent paper by [1] presents an extension of the skew-normal distribution which has subsequently been referred to as a copula. Under this model, the standardized marginal distributions are standard normal and some of the conditional distributions are skewnormal.The skew-normal distribution itself was introduced in two landmark papers [2] and [3]. These papers have led to a very substantial research effort by numerous authors over the last thirty five years. The result of these efforts include, but are certainly not limited to, numerous probability distributions, both univariate and multivariate, which are loosely referred to in the literature as skew-elliptical distributions. This term does not completely describe the rich features of these distributions, but for convenience will be used in this paper. Notable contributions to these developments include papers by [4][5][6][7][8][9] among numerous others.
The many multivariate distributions that may be referred to as skew-elliptical offer coherent probability models that are used in a wide variety of applications. However, they all share the feature that the factor which perturbs symmetry is applied as a multiplier to a distribution that is elliptically symmetric. Consequently the marginal distributions are all prescribed. For example, the multivariate skew-t distribution described in [5] leads to marginal distributions that are all univariate skew-t distributions with the same degrees of freedom. While such a restriction may be acceptable or even irrelevant for some purposes, it is nonetheless the case that some multivariate applications have marginal distributions that are different. As is very well known, there is a large literature based on copulas which derives from the original paper of [10]. The purpose of a copula is to separate the modeling of the dependence structure of set of variables from the analysis of the marginal distributions. By implication, the marginal distributions need not be the same; that is, they may differ by more than just scale and location.
In a recent paper concerned primarily with projection pursuit, ref [1] presents a trivariate distribution based on the skew-normal which is also in a general sense a copula.
The scalar parameter λ ∈ R determines the extent of the dependence between X 1 and X 2 . In Section 5, the bivariate distribution is extended to n variables. To accommodate this extension it is convenient to employ the notation X ∼ SNC n (λ), where X = (X 1 , X 2 , . . . , X n ) T . As [1] shows, in the bivariate case the marginal distribution of each X i is N(0, 1) and the conditional distribution of X 1 given that X 2 = x 2 is skew-normal with density function f (x 1 |X 2 = x 2 ) = 2φ(x 1 )Φ(λx 1 x 2 ). (2) There is an analogous expression for the density function of the conditional distributions of X 2 given that X 1 = x 1 . It may also be noted that X 2 2 is distributed independently of X 1 as a χ 2 (1) variable, with the analogous result for the distributions of X 2 and X 2 1 . Hence the X 2 i , i = 1, 2 are independently distributed as χ 2 (1) .
To illustrate the difference from the formal definition of a copula, consider the wellknown Gaussian copula. In the bivariate case, dependence between X 1 and X 2 is described by the function where Φ 2,C {..} denotes the distribution function of a bivariate normal distribution with zero means and correlation matrix C and the U 1,2 are each uniformly distributed on [0, 1]. The formulation at Equation (2) uses only the standard univariate normal distribution function. Sketches of the skew-normal density function for λ ∈ {−10.0, −2.6, 0.0, 10.0} are shown in Figure 1. Sketches of the bivariate density function for λ = 1.0 and −5.0 are shown in Figure 2. The bi-modal nature of the density function when λ = −5.0 is noteworthy. Indeed it is straight forward to show that the density function is bi-modal if |λ| > √ π/2, with the model values being at points ±(X, X) depending on λ. Examples of modal values are shown in Table 1

Cross-Moments
The covariance of X 1 and X 2 , which also equals their correlation, is which is readily shown to be cov(X 1 , or cov(X 1 , where Y ∼ χ 2 (1) . There is no reported analytic expression for this integral in general. However, it equals zero when λ = 0 and tends to ±2/π as λ → ±∞. Note that the integral in Equation (5) may also be expressed in terms of a Chi-squared distribution with 3 degrees of freedom. Table 2 shows values of the correlation for a range of positive values of λ.
Note that there is no need to derive this result using integration by parts. Since p or q must be even, it is a consequence of one of the properties above, as is the result for the case where p and q are both even For p and q both odd, the expectation satisfies the recursion with In a recent paper, ref [13] show that cross-moments of this distribution may also be computed using a new extension of Stein's lemma, ref [14]. There is no particular advantage in using the new lemma for the bivariate distribution at Equation (1). The result is however employed in Section 5 which is concerned with the more general case of n variables. If p = 2k + 1 and q = 2l + 1, it is straightforward to show that as λ → ±∞ the limiting value The limiting values of a selection of odd order cross moments E X p 1 X q 2 are shown in Table 3 and a selection of moments corresponding to λ = 1.0 in Table 4. (1 + λ 2 s 2 ) n/2 ds, (14) with Z ∼ N(0, 1) and The expression at Equation (13) may be computed using the methods reported in [17]. This distribution possesses two interesting properties as |y| → ∞. Proposition 2. Let X have the distribution with density function given by Equation (11). The following results hold:

1.
For λ > 0, as y → −∞ the limiting form of the distribution of X is skew-normal with shape parameter λy; that is

2.
For Part 1 of the proposition may be established using the well-known asymptotic formula for the standard normal integral ( [16], page 932, Equation 26.2.12); that is and integration by parts. In this case for λ > 0 and y << 0 the second term is negligible. There are analogous results for λ < 0. Equation (18) and the same assumption are used again in some of the results below. It is interesting to note that Part 1 of the propsition results in the same density function as that at Equation (2). Numerical computations or asymptotic arguments are necessary in order to employ the distribution at Equation (11); for example to compute moments or critical values. Nonetheless, it is arguably a more flexible form than the familiar skew-normal shown at Equation (2). Examples of the density functions from Equations (2) and (11) are shown in Figure 3. In Table 5, m1 and m3 are the first and third moment about the origin, sk and ku are skewness and kurtosis, ssk and sku are the corresponding standardized values. The table shows a selection of moments for λ = 1 and a range of values of y from −10.0 to 10.0. For positive values of y, skewness and kurtosis rapidly tend to 0 and 3 , respectively. For y < 0 skewness is negative and there is excess kurtosis. As above, there are analogous results for λ < 0.

An Extended Skew-Normal Copula
The skew-normal and skew-Student distributions have extended forms. These arise naturally when conditional distributions are considered. They also arise as hidden truncation models. In standardized form, the univariate extended skew-normal distribution has the density function The extended skew-normal distribution may be derived by considering a standardized bivariate normal distribution of X and Y with correlation ρ. The distribution at Equation (19) is the density of X given that Y ≥ −τ, with λ = ρ/ 1 − ρ 2 . The distribution may be denoted ESN(λ, τ). A bivariate extended skew-normal copula distribution that is analogous to that in Section 2 has the density function where Ω(τ, λ) > 0 is the normalizing constant. Integration with respect to X 1 shows that which may be evaluated numerically. In addition, the marginal distribution of X is symmetric and has the density function.
Values of Ω(τ, λ) computed to four decimal places for a selection of values of τ and λ are shown in Table 6. Note that for values of τ less than say −5 the values of Ω are small, implying the need for care with its computation. The shorthand notation X ∼ ESNC n (τ, λ) which extends that defined in Section 2 will be used. The basic properties of the bivariate extended skew-normal copula distribution are as follows.
, let X denote either variable and definẽ The following properties hold 1. E(X) = 0.

var
3. cov(X 1 , Proof of this proposition is in the Appendix A. Table 7 shows standard errors for a range of values of τ and λ. These were computed using numerical integration. Table 8 shows the correlation coefficients for a range of parameter values. For λ = 1.0 Figure  4 shows (i) an example of the bivariate skew-normal density function for τ = −1.0 and (ii) the marginal density for τ = −2.5 and 0.0. Higher order cross moments may be computed recursively, but for non-zero values of τ the resulting integrals are twodimensional. Depending on the values of λ and τ the extended version of the distribution is also bi-modal.    The extended version of the distribution has a conditional distribution that is similar to that described in Proposition 2. As above, proof is in the Appendix A. Proposition 4. Let X 1 and X 2 have the distribution with density function given by Equation (20). The following results hold: 1.
The distribution function of X 2 is 3.
The density function of X 1 given that X 2 ≤ y is

4.
For λ > 0, as y → −∞ the limiting form of the distribution of X 1 has the density function whereτ y is as defined in Proposition 3 andλ y = λy; that is, X 1 |X 2 ≤ y ∼ ESN τ y , λy . Note that as in Section 3 the distributions in Part 4 of both Propositions 3 and 4 are the same. There are analogous results for λ < 0 as y → ∞.

Extension to n Variables
There is a self-evident extension for the distribution of an n-vector X, with i-th element X i , which has density function and shorthand notation X ∼ SNC n (λ). Recall that [1] describes the case n = 3. This distribution has the following properties: The members of any subset of X of size 2, . . . , n − 1 are independently distributed as If X is partitioned into two non-overlapping sets, {X i }; i = 1, . . . , p and X j ; j = p + 1, . . . , p + q with p + q ≤ n then (i) the X 2 i are independently distributed as χ 2 (1) variables independently of (ii) the X j , which are themselves independently distributed as N(0, 1).

5.
The distribution of X i given X j = x j , j = i is skew-normal, with density function that is, with shape parameter λΠ n j =i x j . 6.
The distribution of X j , j = i given X i = x i is a skew-normal copula, with shape parameter λx i .
Other properties are briefly described in the rest of this section.

Cross-Moments
The first order cross-moment is As |λ| increases the limiting value of Equation (30) is ±(2/π) n/2 . There are expressions for higher order multivariate moments which are similar to those at Equations (7) and (8). If the {p i }; i = 1, . . . , n are all even then property (4) implies that If any p i is even and any other is odd then the expectation at the left hand side of Equation (31) is zero. For the general multivariate case with all {p i } odd , higher order cross-moments may be computed in principle using a version of the extension to Stein's lemma reported in [13], as follows:

Proposition 5. Extension to Stein's lemma ([13])
Let X T = (X 1 , X 2 , . . . , X n ) and let g(X) : R n → R be a function that is differentiable almost everywhere. Noting that , is the density function of the standard multivariate normal distribution evaluated at X = x, the extension to Stein's lemma states that where E denotes expectation taken over the skew-normal copula density function at Equation (28). When ∇g is given by The first term of the vector cov{X, Note that the second term is equal to zero if p 1 is even. Otherwise it may be reduced to an integral in n − 1 dimensions and the variables Y i each independently distributed as χ 2 (1) . Note that the second and subsequent elements of cov{X, g(X)} will recover other cross-moments.

Skew Distributions Generated by Conditioning
Similar to the results in Section 3 a skewed distribution may be obtained by conditioning on X 1 ≤ y. The distribution has the density function Using arguments similar to those in Section 3, it then follows that for λ > 0 and as y → −∞ the distribution of the n − 1 vector {X i }; i = 2, . . . , n is also a skew-normal copula with shape parameter λy. Similarly the distribution of X 1 given X i ≤ x i ; i = 2, . . . , n with X i → −∞ for at least one value of i is skew-normal with shape parameter λΠ n i=2 x i There are analogous results for λ < 0 and y → ∞.

Distribution Function and Related Computations
Details are omitted, but for all X i substantially less than zero, an approximation to the distribution function is from which VaR denoted X * as before may be computed. Similar to the bivariate case defined at Equation (78), CVaR is defined in general as where 1 is an n-vector of ones. For a single variable, CVaR defined as is approximately equal to VaR. As before, tail dependence is zero. Similar to the bivariate case, the distribution of Y = Π n i=1 X i , when the X i are independently distributed as N 0, σ 2 i has the density function where G(.) denotes Meijer's G-function. As above, see [15] for further details.

Extended Distributions
Similar to the results in Section 4, the skew-normal copula for n variables has an extended form. The multivariate extended skew-normal copula distribution has the density function where Ω n (τ, λ) is the normalizing constant. Integrating with respect to, say, X 1 shows that Ω is given by In principle this may be reduced to a one-dimensional integral of the form where the scalar variable S is distributed as the product of n − 1 independent variables each distributed as χ 2 (1) . As already noted, the density function of S given by Fox's H-function, ref ( [12]). The effect of non-zero τ is to induce dependence in the marginal distributions. The marginal distribution ofX T = X 1 , . . . , X p ; p < n has the symmetric density function Consequently the conditional distribution ofX T = X p+1 , . . . , X n ; p < n givenX =x is ESNC n−p τ * p , λ * p . It is conjectured that, similar to the results in Propositions 2 and 4 that the conditional distribution ofX T = X p+1 , . . . , X n ; p < n givenX << 0 of the same type.

Skew-Student Copulas
Student's t distribution and its multivariate counterpart both arise as scale mixtures of the normal and multivariate normal distributions, respectively, as well as being sampling distributions in their own right. Similarly, the skew-Student distribution and its extended counterpart may be derived as scale mixtures. It is therefore natural to inquire whether there are parallel developments for the skew-normal copula distribution of Section 2 and subsequent sections. The potential attraction of such a development is the opportunity to have marginal Student's t distributions with differing degrees of freedom. In addition to the skew-Student distribution derived formally in [5], the earlier work in [18] suggests that more flexible constructions may also be contemplated. The first two sub-sections below therefore present two such approaches to skew-Student copulas. The third then describes a distributions that is derived as a scale mixture. In the interests of paper length, results are presented briefly, with further details available on request.

Skew-Student Copula-Case I
The first case has a density function given by where t ν (.) and T ν (.), respectively, denote the density and distribution functions of a Student's t variable with ν degrees of freedom. The univariate version of this distribution is referred to here as the linear skew-t. Allowing ω to increase without limit gives a distribution in which T ω (.) is replaced by Φ(.). The properties of the distribution, which is denoted SSC n,I (λ;˚, ω), with˚T = (ν 1 , . . . ν n ) are similar to those described in Sections 2 and 5, namely The members of any subset of X of size 2, . . . , n − 1 are independently distributed as t(ν i ). 3.

4.
If X is partitioned into two non-overlapping sets, {X i }; i = 1, . . . , p and X j ; j = p + 1, . . . , p + q with p + q ≤ n then (i) the X 2 i are independently distributed as F(1, ν (i) ) variables independently of (ii) the X j , which are themselves independently distributed as t ν i . 5.
The distribution of X i given X j = x j , j = i has the density function that is, a linear skew-t. There is a similar result for the conditional distribution of A conditional distribution is obtained in the same manner as that at Equation (11), namely by conditioning on X 1 ≤ y. The resulting density function is Cross-moments may be computed using recursions similar in principle to those in Section 2.1. Similar to the results in Section 3, for λ > 0 and y << 0 it may be shown that conditional on X 1 ≤ y << 0 the remaining variables have an asymptotic skew-Student copula distribution of the same type as that at Equation (45), with shape parameter λy. Similarly the distribution of X 1 given X i ≤ x i ; i = 2, . . . , n with x i << 0 for at least one value of i is linear skew-Student with shape parameter λΠ n i=2 x i . There are analogous results for λ < 0 and y >> 0.
The distribution defined at Equation (45) does not lead easily to an extended version. Although there are analytic expressions for the distribution of the difference of two independent Student's t variates, the expressions are complicated-see for example [19,20]. This remains a topic for future research.

Skew-Student Copula-Case II
A second skew-Student copula distribution, denoted SSC n,I I (λ;˚, ω), has the density function When n = 1 and ω = ν 1 this is the skew-t distribution due originally to [5]. To distinguish it from the linear form above this is referred to as the Azzalini skew-t. The properties of the distribution at Equation (48) are essentially the same as those listed in Section 6.1. The asymptotic conditional distribution of X j ; j = 2, . . . , n given that There is a similar result for the conditional distribution of X 1 given that X j ≤ x j << 0; j = 2, . . . , n.

Skew-Student Copula-Case III
In the third case, which is arguably a more realistic representation, conditional on n mixing variables S i = s i , collectively S = s, the joint density function of the variables X i is with each S i independently distributed as χ 2 (ν i ) /ν i . For this case the distribution of X has density function where M n,N (x) denotes the distribution function, evaluated at x, of the variable . The density function corresponding to M(.) is given by Fox's H-function, see [12]. It is interesting to note that the form of Equation (50) implies that the function M is the distribution of the variable V that is symmetric. The scale mixture also does not add great additional complications to the expressions for cross-moments, although, as above, numerical computation is required. The distribution shares properties 1 to 4 of the case I distribution reported in Section 6.1. This distribution does not appear in the literature and derivation of the distribution and density functions are future research tasks.

Multivariate Distributions
When X and Y are both n-vectors, a basic multivariate version of the SN-copula distribution has the density function where φ n (.) denotes the density function of the N n (0, I) distribution. The marginal distributions of X and Y are each standard multivariate normal N n (0, I). The conditional distribution of X given that Y = y is multivariate skew-normal with density function This section of the paper briefly describes the basic properties of the distribution at Equation (51) and extensions thereof. Details of more advanced developments, such as conditional distributions similar in concept to those described in earlier sections are left as topics for future development. In private correspondence Loperfido, ref [21], has proposed an extension of the distribution at Equation (51). In this extension the scale matrices of X and Y are not restricted to be unit matrices and the location parameters are not restricted to be zero vectors. Using their notation, the joint distribution of the random vectors X and Y has the density function whereφ n (x; µ, Σ) andφ n (y; υ, Ω) are the density functions of vectors X and Y distributed, respectively, as N n (µ, Σ) and N n (υ, Ω) and where Φ(·) is as defined above. In related correspondence, ref [22] notes that an alternative argument for Φ in Equation (51) is x T Ay, where A is an n × m matrix and y is an m-vector. Setting µ and υ to 0, a vector of zeros of appropriate length, this leads to the density function In the same correspondence [21] considers a generalization of this multivariate distribution with density function where h n (·) and k m (·) are the density functions of n and m dimensional elliptically symmetric distributions. The function π(·) satisfies 0 ≤ π(−a) = 1 − π(a) ≤ 1 for all a ∈ R. The distribution at Equation (55) is an extension of the generalized skew-normal described in [7,23].

Marginal and Conditional Distributions-A Basic Example
To illustrate the properties of the marginal and conditional distributions consider the basic bivariate case. Let the density function of the four random variables X 1 , X 2 , Y 1 and Integration with respect to X 2 gives Integration of Equation (57) with respect to X 1 gives the joint density of Y i ; i = 1, 2 as expected. Similarly, integration of Equation (57) with respect to Y 1 gives the marginal density joint density of X 1 and Y 2 However, the marginal density joint density of X 1 and Y 1 is given by To the best of my knowledge there is no closed form expression for the integral at Equation (60). The conditional distribution of X 1 given that Y 1 = y 1 has the PDF Example densities for λ 1 = λ 2 = 1 and Y 1 = −10, −5 and 0 are shown in Figure 5. There is a corresponding expression for the PDF of Y 1 given that X 1 = x 1 .
The figure shows conditional density functions that resemble the skew-normal for conditioning values equal to -10 and -5 for 1 = 2 =1. Conditioning on 0 gives the standard normal density.

Marginal and Conditional Distributions-General Results
For the distribution at Equation (54) suppose that X is partitioned into two components of length n 1 and n 2 = n − n 1 and that A and Σ are partitioned similarly. Thus and writeφ n (x; Σ) asφ with B = Σ 21 Σ −1 11 and Σ 2|1 = Σ 22 − Σ 21 Σ −1 11 Σ 12 . The argument of Φ(.) may be written as Integration over x 2 yields the density of the joint distribution of X 1 and Y f (x 1 , y) = 2φ n 1 (x 1 ; Integration over X 1 [Y] recovers the normal distribution of Y[X 1 ] as expected. The conditional distribution of X 1 given Y = y is skew-normal with shape parameter There is no closed form expression for the joint (marginal) density of X 1 and Y i ; i = 1 or = 2, except in the very obscure special case for which A 2 Σ 2|1 A T 2 = 0.

Extended Version
Following the approach used in Sections 4 and 5.4, an extended version of the distribution at Equation (54) has the density function where Ψ(τ, λ) is the normalizing constant. Integration with respect to x or y shows that this is given by the n-dimensional integral or alternatively by the m-dimensional integral

Student Version
In the usual way, consider the distribution of X and Y conditional on S i = s i ; i = 1, 2 where the S i are independently distributed as χ 2 (ν i ) /ν i . The conditional density function is Standard manipulations give the following expression for the density function of X and Y f (x, y) = 2t ν 1 ,n (x, Σ)t ν 2 ,m (y, Ω)E S * Φ λx T Ay s 1 s 2 (ν 1 + 1)(ν 1 + 1) where E S * denotes expectation over the distribution of the variables S i ; i = 1, 2 which are independently distributed as χ 2 (ν i +1) /(ν i + 1) variables andt ν,n (x, Σ) denotes the density function of an n-variate multivariate Student distribution with ν degrees of freedom , location parameter vector 0 and scale matrix Σ. Note that integration over, say, S 1 reduces the right hand side of Equation (68) to 2t ν 1 ,n (x, Σ)t ν 2 ,m (y, Ω)E S 2 T ν 1 +1 (R) , R = λx T Ay s 2 (ν 1 + 1)(ν 1 + 1) which may be computed numerically for given values of x and y. An extended version of the distribution may be developed in the same way as in Section 7.3. The expression at (69) becomest Given the need for numerical computation indicated at Equations (68) and (69), an alternative skew-Student copula may be obtained by extending the distribution described in Section 6.1. The density function is f (x, y) = 2t ν 1 ,n (x, Σ)t ν 2 ,m (y, Ω)T ω λx T Ay .

Stein's Lemma
This lemma is useful in Portfolio theory and for the computation of moments and cross moments. The treatment in this sub-section follows that in [13].
Let g(X, Y) be a scalar valued function of X and Y subject to the usual regularity conditions and consider The right hand side of Equation (71) is This is The second term is 2 π E Y Ay with a similar expression for E{Y g(.)}. On using the lemma, ∇ x g(x) = 0 and the second term above is which agrees with Equation (4).

Example 2. General Case
For the general case, g(x, y) = y i for i = 1, . . . , m. As above, ∇ x g(x) = 0 and the second term in the lemma is

Hence, the cross covariance matrix is
This expression must be computed numerically and it must equal Note also that higher order cross moments, that is, E X p i Y q j , may also be computed using Stein's lemma, albeit with numerical integration.

Example 3. Portfolio Selection
For portfolio selection assume that X denotes asset returns and that Y denotes sources of skewness in the conditional distribution of X given that Y = y. This model reflects an empirical feature of some markets, namely that skewness may be time varying. The return on a portfolio with weights w is w T X. If the utility function is U w T X the first order conditions for portfolio selection conditional on Y = y contain the term E X XU w T X Hence g(X) = U w T X and ∇g X (X) = wU w T X . Stein's lemma yields Assuming that the order of integration may be changed the second term is proportional to A y yφ m (y; Ω)φ x T Ay dy .
which equals zero. Thus, portfolio selection results in a portfolio on the efficient frontier, as expected. Note that if expectations are taken over the conditional distribution of X given Y = y the result is the same as in [24] with appropriate changes of notation.

Three Examples
This section of the paper contains three numerical examples, the purpose of which is to illustrate some aspects of the skew-normal copula and related distributions in action. The first presents results for the distribution function of the bivariate skew-normal copula of Section 2, focusing on asymptotic results for tail probability computations. Example two presents specimen estimation results for the bivariate skew-Student copula of Section 6.1. The final illustration has results for the multivariate skew-normal copula that is described in Section 7, specifically the distribution at Equation (53).

Bivariate Skew-Normal Copula Distribution Function and Related Computations
The distribution function corresponding to the density function at Equation (1) is There is no analytic expression for this integral. For specified values of x 1,2 it may be computed numerically. For some applications, for example in finance, there is a requirement to compute the distribution function when both X 1,2 are substantially less than zero. For λ > 0, x 1,2 << 0 and ignoring power of 1/|x 1,2 | greater than 2, the resulting asymptotic expression is (76) Figure 6 shows an example of the exact and approximate distribution function corresponding to λ = 1. The quantity tabulated is P(X 1 ≤ X * , X 2 ≤ X * ) for values of X * between −5.0 and −0.1. Similar results may be computed for λ < 0. It may be noted that there are combinations of values of λ, x 1 and x 2 under which the second term is negligible and Φ(λx 1 x 2 ) 1; that is The distribution function in blue is the square root of the probability that X and Y are both less than or equal to X*, for values of X* in the range [-5, -0.1], computed numerically. For comparison, the distribution function in orange shows the approximation given in Equation (11). The bivariate skew-normal copula distribution is not specifically suited for financial applications, but it may be used to compute Value at Risk, VaR henceforth. As the variables are standardized, VaR is a critical value in the left hand tail of the distribution X * such that for a specified value of α that is small. Conditional Value at Risk, CVaR henceforth, is a related measure. For a single variable X, CVaR is defined as the expected value of X given that it is less than the VaR. For this distribution the properties listed above means that CVaR is the same as that based on the standard normal distribution. A bivariate version of CVaR is defined as For X 1 alone this is For λ > 0 and x * << 0, similar arguments to those above lead to where P(X 2 ≤ x * |x * ) denotes the distribution function of a standardized skew-normal distribution with shape parameter equal to λx * . Using Equation (76) shows that tail dependence equals zero for the skew-normal copula distribution. A Selection of values of CVaR when λ = 1 is shown in Table 9 for critical values ranging from −9.5 to −0.5. The values shown in the second column were computed using numerical inegration. Those in the third column were computed using the asymptotic formula shown at Equation (80). It is suggested that the asymptotic formula leads to values that would be sufficiently accurate for practical purposes.

Bivariate Student t Copula
This example is based on the bivariate version of the skew-Student copula of Section 6.1 with T ω (.) replaced by Φ(.). The density function at Equation (45) becomes First, note that for specified ranges of the shape parameter λ this distribution is bimodal. As in Section 2, bimodality requires that |λ| > √ π/2. For the Student case the modal value of X and λ > 0) depends on the degrees of freedom and satisfies λ √ ν 1 ν 2 1 + x 2 /ν 1 (ν 1 + 1)(ν 2 + 1)∆ ξ 1 λ ν 2 (ν 1 + 1)x 2 ν 1 (ν 2 + 1)∆ Table 10. The first column of the table shows values of λ. Panel 1 of the table shows the modal value for a set of cases in which the degrees of freedom are equal. Panel 2 shows corresponding values when ν 1 = 3. For comparison purposes, column 2 of panel 2 shows the corresponding modal values for the bivariate skew-normal case.

Sets of computed values are shown in
As in Section 2 the correlation between the two variables is computed numerically. A selection of values is shown in Table 11. The first column of the table shows a range of values of λ. Panel 1 shows the computed correlation for a selection of cases for which ν 1 = ν 2 . The last column shows the corresponding values for the skew-normal copula for comparison. The second panel shows a range of values when ν 1 = 3. The results in the table indicate that the relationship between the shape parameter λ and the degrees of freedom is non-linear. Generally, however, correlation is reduced when the degrees of freedom are finite: a noteworthy difference from the bivariate Student distribution. To illustrate parameter estimation, the inclusion of scale and location results in the density function Note that this parameterization means that (i) estimators of scale and degrees of freedom are given by the analogous results for a univariate Student's t distribution, (ii) the estimator of shape depends only on the skewing function Φ and its argument and that (iii) only the estimators of location are complicated by the skewing function. Consequently, in the Fisher information [FI] matrix the non-zero off diagonal elements are in the cells corresponding to µ 1,2 and λ and σ 2 i and ν i ; i = 1, 2. Specifically, the FI matrix is where l θθ denotes the expected value of the second derivative of the log-likelihood function with respect to a parameter θ. The non-zero elements of FI to be computed numerically using the distribution at Equation (82) are where ξ 1 (x) is as defined in Proposition 1 and ξ 2 (x) = −ξ 1 (x){x + ξ 1 (x)}. The remaining elements corresponding to scale and degrees of freedom are the same as those for the FI matrix for Student's t, namely where Ψ (ν) = d 2 logΓ(ν)/dν 2 . These are standard results, but may be found in [25]. The distributions described in this paper all possess the property that the shape parameter may take a value on the boundary of the parameter space. Nonetheless, the FI matrix is inverted and used in the usual way to provide an estimate of the variability of the estimated model parameters. ML estimators of the parameters may be computed using a Newton-Raphson type scheme. The data set used consists of the weekly returns on 30 stocks that are constituents of the United States S&P500 index. As the example in this subsection and in the following are solely for purposes of illustration, the stocks are numbered. The example in this subsection presents results for 14 pairs of securities, namely stock 1 successively paired with stocks 2 through 15. The computations shown in Table 12 through Table 13 and 14 are based on 100 observations.A standard set of descriptive statistics for the 15 stocks is shown in Table 12.  Table 13 shows estimated parameters for the 14 specified pairs of stocks, computed using the method of maximum likelihood [ML]. Panel 1 shows the estimated parameters. Note that the estimates of the degrees of freedom are shown truncated. Panel 2 shows estimates of parameter precision computed by inverting the FI matrix. The estimated values of the shape parameter λ are all positive, consistent with the stylized fact that stock returns are generally positively correlated. There are four other points to note. First, the estimate of the location parameter for stock 1 depends on the choice of the second stock; that is, it is affected by the presence of the skewing factor and the non-zero values of λ. Secondly, the magnitude of each estimated λ in panel 1 is less than the corresponding estimate of parameter precision in panel 2. This suggests that a test of the null hypothesis H 0 : λ = 0 would not be rejected against either a one-or two sided alternative. This suggestion, however, is not supported by corresponding likelihood ratio tests reported in Table 15, all of which would lead to rejection of H 0 . Thirdly, the estimated values of λ are all less in magnitude than that required for the distribution to be bi-modal. Finally, it is of interest to inquire if the small estimated values λ have much effect on critical values [CVs]. For each of the fourteen pairs of stocks columns 2, 3 and 4 of Table 14 show computed critical values at probabilities of 0.002, 0.01 and 0.05. Columns 5 through 7 show a measure of the effect of the non-zero value of λ. This is computed as follows. When the degrees of freedom are equal, the CV is computed for Student's t distribution with the same degrees of freedom. The column entries show the percentage difference. For example, for stock pair 1-3 the Student's t CV corresponding to a probability of 0.001 is −7.1732. The CV under the model is −7.5351, a difference of abut 5%. When the degrees of freedom for a stock pair are different, the average is taken to compute the Student quantiles. For some applications the differences in the CVs might be regarded as negligible. For other applications they would not.

Multivariate Skew-Normal Copula
The final illustration uses the multivariate skew-normal copula. The parameterization of the distribution at Equation (53) facilitates estimation. The ML estimators of Σ and Ω depend only on the sets of observations {x} and {y}, respectively. The ML estimator of λ depends on ξ 1 (.). As above, ML parameter estimation requires only a simple Newton-Raphson scheme. The FI matrix has a block structure. The 3n × 3n matrix corresponding to µ, ν and λ is With the following definitions the sub-matrices in the FI matrix are with all expectations being computed numerically. For this illustration, a set of 30 US S&P500 stocks are divided into 2 groups each of 15 stocks. Returns are weekly as before and the data set has 500 observations. The standard set of descriptive statistics is shown in Table 16.
The ML estimators of location and shape are computed using a Newton-Raphson scheme, with the resulting estimates shown in Table 17. As in the previous section, the FI matrix is inverted to provide estimates of precision. Unlike the bivariate example, several of the shape parameters exhibit estimates that are more than twice the precision in absolute value, the estimate for the stock pair 1 and 16 being an example.  A standard likelihood ratio test has a value of 259.3 and thus leads to rejection of the null hypothesis that all shape parameters equal zero.

Concluding Remarks
This paper paper reports the results of an investigation into the properties of a copulalike version of the skew-normal and skew-Student distributions. The distributions studied in the paper allow the marginal distributions to be either normal or Student's t with differing degrees of freedom. There are several conditional distributions that resemble the skew-normal or are closely related to it. Many of the required computations require numerical integration. The properties of some of the distributions studied depend on certain of the special functions, in particular the G and H functions. There are no explicit expressions available for the moment generating or characteristic functions, although moments and cross moments may be computed when they exist. The examples contained in the paper suggest that parameter estimation is straight forward.
The results show that study of marginal distributions may conceal the nature of a dependence structure and that furthermore there may be different such structures. For future research, there are a number of technical issues concerned with integration. There is also scope for more general results based on unified or generalized skew-elliptical distributions. Integrating by parts and using the definition at Equation (23) gives The numerator in (26) is Integrating by parts again with τ = τ √ 1 + λ 2 gives In the second term in T N , s ≤ y << 0, in which case the integrand Φ(s)φ τ + λx 1 s φ(s)φ τ + λx 1 s |s| , may be neglected. Noting that τ =τ y 1 + λ 2 y 2 , completes the proof.