1. Introduction
The skew-normal distribution was introduced in [
1] and the skew-t distribution in [
2]. These two distributions share the property that they may be derived formally. There are several methods of derivation of which probably the best known is to consider the bivariate normal distribution of two random variables
X and
Y, each with zero mean, unit variance and correlation
. The skew-normal distribution arises by then considering the distribution of
X conditional on
(
). There is a similar and equally well-known construction for the skew-t. As well as these formal foundations, the construction corresponds to situations in which the
X variable is sampled only if the second variable
Y takes non-negative (nonpositive) values. Applications where such situations arise are well-known. As
Y is not explicitly observed, these models have been referred to as hidden truncation models; see, for example, [
3]. As well as the skew-t distribution derived formally in [
2], the earlier paper by [
4] shows that other more flexible constructions may be employed. Lemma 1 of [
4] that paper shows that a skew-elliptical distribution may be constructed using an elliptically symmetric distribution and the distribution function corresponding to a density function that is also symmetric. In its simple form, the resulting skew-elliptical density function takes the form
where
denotes a density function of a random variable that is symmetrically distributed about zero. The skewing term
denotes the distribution function of a random variable that is also symmetrically distributed about zero. The distribution at Equation (
1) is often referred to as a symmetry-modulated distribution, with
being called the shape or skewness parameter. There is a substantial development of this work in [
5], with further results in [
6]. Ref. [
7] present a somewhat different construction that may nonetheless be regarded as a symmetry-modulated distribution.
The skew-elliptical distribution considered in this paper is a specific form of Equation (
1) and is based on the Student’s t distribution. In the usual notation, the density function of a random variable
X is
In this paper, this distribution is referred to as the linear skew-t, abbreviated to LST. With this construction, a minor extension that may be useful in some cases is
That is, for this distribution, there is no essential reason for there to be a correspondence between the two degrees of freedom parameters
and
. However, for reasons that follow from Proposition 1 below, the majority of the technical results in this paper impose the restriction
=
. This distribution may be attractive for applied work because it is simpler in structure than the usual Azzalini skew-t, henceforth ST. In addition, as the paper shows, there are combinations of parameter values under which the shape of the two distributions differs. The use of the LST does not seem to be widespread, with the paper by [
8] being an exception. It is straightforward to construct a multivariate version for an
n-vector
. Once again, in the usual notation, the density function corresponding to Equation (
2) is
The aim of this expository paper is as follows. First, it is to investigate whether or not the linear skew-t distribution at Equation (
2) and the multivariate version at Equation (
4) may be derived as a hidden truncation model or scale mixture and, if so, whether the implied mechanisms are realistic. Secondly, as there are differences between the LST and the ST, the paper presents some properties of the distribution, namely the first four moments and critical values. These are compared with the corresponding values for the ST. Thirdly, the paper investigates and compares extended versions of the two distributions, LEST and EST, respectively. Extended versions are important for some methodological developments and empirical applications because they offer greater flexibility in the shape of the distributions, and hence in moments and critical values. The paper also presents results for multivariate distributions. Finally, reflecting the results reported in
Section 3.1 of [
5], the paper reports results for stochastic ordering with respect to shape for the LEST and EST distributions.
The paper has the following sections.
Section 2 contains a review of the literature that summarizes developments in skew-elliptical and related distribution theory.
Section 3 contains results and resulting properties for the ST distribution.
Section 4 describes the corresponding results for the extended version. This distribution is similar in construction to the extended skew-t first referred to in [
2] and described in more detail in [
9,
10]. Results for multivariate versions of the distribution are in
Section 5.
Section 6 is concerned with stochastic ordering with respect to the shape parameter
of both the LEST and EST distributions.
Section 7 contains concluding remarks and a short discussion. There is an appendix containing technical details of some of the results.
This paper contributes to the literature on symmetry-modulated distributions by providing specific results based on Student’s t distribution. The paper describes an extended version of the distribution that is analogous to the extended skew-t. For certain parameter values, the properties of the distribution are different from those of the skew-normal or skew-t. This feature creates the possibility of new application areas in empirical work. It is shown that for certain parameter values, the distribution is bimodal, a feature not exhibited by the skew-normal or skew-t.
The notation denotes the Student’s density function with degrees of freedom, and denotes the density function with location and scale . When not defined explicitly, other notation is that in common use. Density and distribution functions are denoted by and , respectively. As their use is clear from the context, these notations are used without subscripts. Standardized distributions with and are used throughout. Tables and figures that illustrate algebraic results are based on and 10 and a range of values for and the extension parameter .
2. Literature Review
The seminal work by [
1] has led to a large literature whose growth continues and whose scope expands. In toto, there are far too many articles to cite in a single paper, but there is a selective overview in [
11]. From the perspective of this paper, which is concerned with the specific form of distribution at Equation (
1), ref. [
5], A&R2012 henceforth, serves as a benchmark for recent research. Perhaps surprisingly, it has been cited by numerous authors who employ it to support the use of the skew-normal distribution in a range of applications. Ref. [
12] used the skew-normal distribution in statistical process control. More recent examples include applications in astronomy and astrophysics [
13,
14,
15,
16], seemingly unrelated regression [
17] and the weather [
18,
19].
The results in A&R2012, in particular those in
Section 3.1, are developed in [
20,
21] in two papers about the shape parameter. These results are extended in [
22], who introduce the mode invariance in the family of distributions introduced in [
21]. Mode invariance facilitates the study of various properties of the distribution. Ref. [
23] is concerned with stochastic dominance. Ref. [
24] develops a procedure to allow stochastic ordering for the multivariate skew-normal distribution. Stochastic ordering methods for the multivariate normal mean–variance and skew-normal scale–shape mixture models are described in [
25]. There are related results in [
26]. Ref. [
27] presents results for the expected value and characteristic function of the matrix variate skew-normal distribution and applies the results to stochastic ordering. Ref. [
28] employs the results in A&R2012 to develop classification procedures across groups for situations in which marginal distributions are skew-symmetric.
As stated in the introduction, this paper is concerned with the properties of a specific form of Equation (
1) based on Student’s t. There are many other papers that are concerned with the development of specific probability distributions that exhibit asymmetry. Some show a close connection to the original skew-normal or skew-t ones, although others do not. Ref. [
29] presents a multivariate skew-Cauchy distribution. This differs from the conventional multivariate skew-t in that the unobserved variables are independently distributed. The same concept is extended in [
30], in which the unobserved variables are constrained. It is notable that some of the exemplar distributions in this paper are bimodal. Ref. [
31] presents a number of different univariate distributions based on specific forms of Equation (
1). These include, for example, the skew-Laplace distribution. Ref. [
32] is a similar paper which presents detailed properties based on several underlying forms of
and
. These include the exponential power distribution and a distribution based on the modified Bessel function of the second kind (See Chapter 9 of [
33] for further details). The former is due originally to [
34], and the latter to [
35]. Ref. [
36] presents a distribution based on a generalization of Student’s t in which the familiar
term is replaced by
, with corresponding changes to the normalizing constant.
Ref. [
37] describes a general approach to the construction of distributions that are bimodal and which are based to some extent on Equation (
1). In these constructions, the density function is
where
is now an absolutely continuous distribution function and
is the corresponding density. Ref. [
38] introduces distributions that may be referred to collectively as skew-flexible. In their Theorem 2.1, and using the notation of Equation (
1), the density function is
where
is the distribution function corresponding to
. This distribution is bimodal for certain values of
. In a very recent paper, ref. [
39] extended earlier results due to [
40,
41]. Specifically, ref. [
39] studies the extended half-skew normal distribution with density function
, given by
where
is the normalizing constant and
is defined at Equation (
1) with
. Ref. [
42] develops a weighted skew-normal model in which the density is
where
and
are as defined at Equation (
1),
is a weight function and
is a parameter which may be vector valued. According to the authors, weighted distributions were introduced by [
43] and offer flexible models for analyzing data sets. Ref. [
44] describes the properties of a specific form of Equation (
8) in which the weight function satisfies
and the underlying density function is skew-normal. Depending on their parameterization, skewed distributions can be bimodal. Ref. [
45] describes such a distribution and provides a useful list of relevant references. Recently, [
46] proposed a multivariate distribution, which they term robust. In their paper, an elliptically symmetric density of a random vector
is multiplied by a skewing function of the form
In a different type of development, ref. [
47] presents a number of bivariate and multivariate log-normal distributions. These distributions constitute a different family from those summarized above. However, they do exhibit asymmetry and, as the authors argue, may be used for many financial data sets.
To summarize, development of probability distributions based on work by Azzalini and many of his co-authors remains an active area of research, as does the appearance in the literature of asymmetric distributions that have a different genesis.
4. Extended Version of the Linear Skew-t
In standardized form, the extended skew-t distribution, SEST henceforth, has the density function
The hidden truncation model corresponding to that in
Section 3 has a normal distribution with conditional mean equal to
and other assumptions unchanged. The linear extended skew-t distribution, LEST henceforth, corresponding to Equation (
11), has the density function
with the normalizing constant
K given by
Equation (
20) may be evaluated numerically. Note that
and also that
For fixed values of
and
, as
, the limiting forms of the distributions at Equations (
18) and (
19) are truncated t.
Proposition 2. For fixed values of ν and τ, in Equations (18) and (19), the limiting form of the argument of the skewing function are, respectively, These take values and for greater than, equal to or less than 0, respectively. For both distributions and , the limiting distribution is the truncated t with density function There is a similar result for the case .
For
, the extended skew-t has the symmetric distribution reported in [
9,
10]. For the linear extended skew-t it is Student’s t.
Examples of the normalizing constant
are shown in
Table 5 for
and 10 and a range of values of
and
. Note that for negative values of
with large magnitude, the computations reported in the table use Lemma 4 reported in [
48], a result originally due to [
49]. Examples of the SEST and LEST density functions for
and
are shown in
Figure 3. In the left-hand panel,
takes values equal to
and
. For
, both density functions are unimodal and visually similar. For
, the two density functions differ markedly in appearance, and the LEST density function is bimodal. The right-hand panel shows LEST density functions for
and
. As
increases, the bimodality becomes more pronounced.
Figure 4 shows the corresponding density functions for
. There is no sign of bimodality. The density functions are not truncated as such, but the figure illustrates the sharp fall in density values, even for
.
Figure 5 and
Figure 6 show examples of the density functions for positive values of
. All the ones shown are unimodal, with little or no visual differences in the density functions apparent with increasing
. The bimodality that occurs for some values of the LEST model parameters suggests that this family of distributions may be of use for applications in which the hazard rate is not monotonic. Health and employment turnover are but two examples of the presence of bimodality; see for example [
50,
51]. Examples of the hazard rate are shown in
Figure 7.
Moments and Critical Values of the Extended Skew-t and Linear Skew-t Distributions
For the extended version of the LST distribution at Equation (
18), and for
, the expression corresponding to that at Equation (
14) is
with
which applies for both odd and even values of
n.
Moments of the SEST and LEST distributions are shown in
Table 6 for
,
and a range of values of
.
Table 7 shows the corresponding results for
and 10. Given the distributions shown in
Figure 3 and
Figure 4, the differences between the moments of the SEST and LEST distributions for negative values of
are not surprising.
Table 8 and
Table 9 show the corresponding critical values. In
Table 8, there are differences in the critical values of the two distributions at all given levels of probability. As implied by the result of Proposition 2, differences in the critical values decrease with increasing
.
5. Multivariate Distributions
The references listed in
Section 2 suggest that the emphasis of research has been on univariate developments, even though Lemma 1 of [
4] proposes a multivariate construction. Like its skew-t counterpart, the construction of the linear skew-t leads to a multivariate form. The multivariate (and bivariate) version of the LST distribution in this paper follows the original development of the multivariate skew-normal at Equation (23) of [
52], with
set to a unit matrix.
There is a similar result to Proposition 1 for the multivariate case that corresponds to Equation (
2). As shown below, the integration involved may be simplified to some extent.
Proposition 3. Conditional on , let ∼, and let Y have a normal distribution with expected value and variance , S∼. The result corresponding to Equation (2) is that the density function of given that is The proof also follows by direct verification. The details are omitted.
An extended version of the distribution at Equation (
23) corresponding to Equation (
19) has the density function
with normalizing constant
given by
This may be reduced to the one-dimensional integral
with
, which may be evaluated numerically. As
, however, the limiting distribution is the standardized extended skew-normal with density function
For certain negative values of
and for small degrees of freedom, the bivariate form of the LEST distribution can be bimodal.
Figure 8 shows two examples when
,
and
and
, respectively. These figures offer further evidence of the more complex shapes that can arise with symmetry-modulated distributions, there being other examples in
Figure 1 of [
2] and
Figure 2 of [
11].
5.1. Marginal Distributions
Let
be partitioned as
, where
X is a scalar and
an
-vector. Similarly, partition the shape parameter vector as
. The density function of marginal distribution of
X is given by
where
This may be reduced to the one-dimensional integral
where
,
is the normalizing constant for the multivariate Student distribution with
degrees of freedom and
n variables and
Using integration by parts, for
, the expected value of
X may be written as the two-dimensional integral
where
Similar expressions for
and second-order cross-moments, which require a trivariate integral, are omitted. The integrals at Equations (
29) and (
31) may be computed numerically. In principle, the method of Equation (
29) may be used to compute the density function of the marginal distribution of a vector-valued subset of
. The expression at Equation (
29) suggests that for finite values of the degrees of freedom parameter
, the marginal distribution of the scalar variable
X may not be linear skew-t. More generally, the LST distribution may not be closed under marginalization. At the time of writing, this remains a conjecture because of the need to use numerical methods to compute the integrals involved. The limiting extended skew-normal distribution is, of course, closed under marginalization.
Figure 9 shows another example of a bivariate LEST distribution which is also bimodal. In this case, both marginal distributions are bimodal too.
5.2. First- and Second-Order Moments
Examples of the values of first- and second-order moments are shown in
Table 10 and
Table 11. These are all computed from bivariate LEST distributions. The first table reports results for
and the second for
. Each table has five horizontal panels for values of
equal to
and 5, respectively. The columns headed with [C] give results computed directly from the bivariate LEST distributions using numerical integration. The columns headed [ST] and [SN] contain exact results from the standardized bivariate extended skew-t (with the same degrees of freedom) and the extended skew-normal distributions. The abbreviations (1,1) and (1,2) refer to the values of the shape parameters used. In the former,
, and in the latter,
. Computations are displayed to four decimal places.
In
Table 10, the computed values of the moments differ depending on the distribution. For the parameters used in the tables, the differences are more marked when
. In
Table 11, as implied by the limiting distribution at Equation (
26), the moments vary little depending on the distribution. As a further illustration,
Table 12 and
Table 13 show a selection of first- and second-order moments for the same values of
as above, with
and 5000 and for combinations of different values of the shape parameter
. To save space, only values for the LEST distribution are shown. More detailed results are available on request. The results in all four tables in this section provide a demonstration of the lack of closure of the LEST distribution under marginalization.
6. Stochastic Ordering
The previous sections demonstrate that there are numerous differences between the SST and LEST distributions, although these do diminish as the degrees of freedom increase. Another area of difference is the presence or absence of stochastic ordering. This property, which is related to stochastic dominance, is satisfied by the skew-t itself. More generally, it is satisfied by distributions of the type described in [
5], in particular in
Section 3.1 of that paper. The result below, motivated by [
53] and believed to be a new result, shows that stochastic ordering holds for the standardized extended skew-t distribution, as well as the skew-t.
Proposition 4. Let the random variable X have the standardized extended skew-t distribution with density function at Equation (18) and denote the distribution function by . It follows that , that is, stochastic ordering with respect to λ holds for nonzero values of τ. The proof follows by direct verification, with details in Appendix C. For the linear extended skew t, the distribution function corresponding to the density at Equation (
19) is
where
from which the numerator of
is
A useful algebraic representation for the LEST distribution function at Equation (
33) is not available at present. Numerical computations for a range of values of
,
and
are reported in
Table 14. The results show the computed maximum value of the numerator of
, rounded to 10 decimal places. For many of the parameter combinations shown, the maxima equal zero. For numerous others, the maxima are positive. The results shown in this table demonstrate that, to 10 decimal places at least, the linear extended skew-t does not satisfy the required conditions for stochastic ordering to hold.
Note that the results in this subsection hold for nonstandardized versions of the LEST and EST distributions at Equations (
18) and (
19). They require modification for parameterizations in which the shape parameter
is a component of scale.
7. Concluding Remarks and Discussion
The linear skew-t distribution is a specific form of symmetry-modulated distribution in which the two density functions used in its construction are Student’s t. Extended versions of the distribution, which give greater flexibility in both moments and critical values, are developed following the type of parameterization used for the extended skew-normal one. Parameter estimation is facilitated by the simpler argument of the skewing function . However, the stronger motivation for use of the LST or LEST distributions would arise from differences in the shape of the density function, particularly for applications for which is small or is negative and of large magnitude. For certain values of these parameters, the distribution is bimodal. This feature creates the possibility of new application areas in empirical work.
The linear skew-t distribution arises as a result of conditioning on non-negative values of an unobserved variable and a scale mixture. The conditioning employed to derive the original skew-normal and skew-t distributions is both simple and has a useful interpretation, which relates to real situations. It is hard to envisage an empirical application in which conditioning specified in
Section 3 or
Section 4 would have a simple interpretation. Use of the linear skew-t distributions described in this paper would be justified mainly on empirical grounds, that is, determined by the data in question.
The extra simplicity in the parameterization of the multivariate form of the distribution may be useful for some applications. However, lack of closure under marginalization or conditioning may be a limitation for some multivariate applications. As an example, suppose that the returns on all 30 constituent stocks of the Dow Jones Industrial Index are modeled using the multivariate extended skew-t, MEST. That the joint distribution of any subset of the 30 stocks is also MEST would seem to be a desirable property. Conversely, suppose that the LEST is used instead. In this case, the joint distribution of returns on a subset of the 30 stocks is not LEST.
Another difference between the EST and LEST distributions is their stochastic ordering properties with respect to the shape parameter . As the paper shows, the property is satisfied for the MEST distribution, but computational results indicate that it is not generally satisfied for the LEST. For applications for which stochastic ordering is an important consideration, the LEST distribution may not be a suitable choice.
From a methodological perspective, a closed version of the LEST distribution, analogous to the closed skew-normal or SUN distribution, has potential as a future research project to offer more powerful tools. The results in this paper also suggest that symmetry-modulated distributions based on other underlying density and distribution functions may offer interesting and useful insights. However, the results here serve to remind that the relative simplicity of the skew-normal, skew-t and extensions thereof may still be preferable for some applications.