Abstract
The multivariate skew-t distribution plays an important role in statistics since it combines skewness with heavy tails, a very common feature in real-world data. A generalization of this distribution is the truncated multivariate skew-t distribution which contains the truncated multivariate t distribution and the truncated multivariate skew-normal distribution as special cases. In this article, we study several distributional properties of the truncated multivariate skew-t distribution involving affine transformations, marginalization, and conditioning. The generation of random samples from this distribution is described.
Keywords:
marginal distribution; multivariate skew-t distribution; rejection sampling; truncated distribution; skewness MSC:
60E05; 62E15; 62-08
1. Introduction
The multivariate skew-normal distribution (Azzalini and Capitanio [1]) and its extensions have received growing attention in recent years. Although this distribution controls the skewness of the data, its sensitivity to the presence of outliers is one of its limitations. The multivariate skew-t distribution (Azzalini and Capitanio [2], Arellano-Valle and Genton [3]) is an alternative for dealing with skewness and outliers in the data since it has heavier tails than that of skew-normal. Several generalizations of this distribution have been studied in the statistical literature. For example, Arellano-Valle and Genton [3] introduced and studied the multivariate extended skew-t distribution. These authors have also defined the multivariate unified skew-t distribution. In a more general setup, the class of multivariate unified skew-elliptical distributions has been studied by Arellano-Valle and Azzalini [4] and Arellano-Valle and Genton [5].
Another extension of the multivariate skew-t distribution is obtained as the conditional distribution resulting from restricting its domain. Such distribution is called the truncated multivariate skew-t distribution and is defined in Galarza Morales et al. [6]. These authors studied the moments of the doubly truncated selection elliptical distributions from which the moments of the doubly truncated multivariate skew-t distribution can be obtained as particular cases. The truncated multivariate skew-t distribution is the focus of the present paper. This distribution has special cases such as the truncated multivariate t distribution (Morán-Vásquez and Ferrari [7,8]) and the truncated multivariate skew-normal distribution (Morán-Vásquez et al. [9], Galarza Morales et al. [10]). Many statistical models in Bayesian statistics, regression analysis and survival analysis involve truncated multivariate distributions. These distributions have received considerable attention in the scientific literature since they often arise in a wide variety of applied sciences such as physics, economics, biology, medicine, engineering, among others. Several studies relating truncation have been developed, mainly in the elliptical class and in two of its members, the multivariate normal and t distributions. For example, see Morán-Vásquez and Ferrari [7,8], Kim [11], Arellano-Valle et al. [12], Kan and Robotti [13], Arismendi [14], Ho et al. [15], Nadarajah [16], Horrace [17,18], Tallis [19,20,21], and Birnbaum and Meyer [22].
The modeling of skewed multivariate data with values restricted to a subset of is of statistical interest, especially in the presence of outliers. Examples of this situation appear in Marchenko and Genton [23], Morán-Vásquez and Ferrari [7], and Morán-Vásquez et al. [24]. These works present parametric methodologies to model correlated multivariate positive data which are skewed and heavy-tailed. However, the statistical literature when this type of data is restricted to an arbitrary subset of is less frequent. The truncated multivariate skew-t distribution may be appropriate for modeling correlated skewed and heavy-tailed multivariate data whose values are restricted to a subset of . This type of data occurs, for example, in environmental studies where variables such as pH, grades, and humidity have upper and lower physical bounds, and their densities are not necessarily symmetric within these bounds (Flecher et al. [25]).
In this paper, we establish several distributional properties on the truncated multivariate skew-t distribution involving affine transformations, marginalization, and conditioning. In addition, we describe a procedure based on rejection sampling to simulate random samples from this distribution. Our results generalize some properties on the truncated multivariate skew-normal distribution studied by Morán-Vásquez et al. [9]. Additionally, applications of our results establish some new properties on the truncated multivariate t distribution.
The paper is organized as follows. Section 2 defines the truncated multivariate skew-t distribution and presents other related families. Section 3 and Section 4 focus on distributional properties and random samples generation of the truncated multivariate skew-t distribution, respectively. Section 5 closes the paper with conclusions and future work.
2. The Truncated Multivariate Skew-t Distribution
In what follows, if is a square matrix, then denotes the determinant of . If is a symmetric matrix, then means that is positive definite. Furthermore, denotes the unique symmetric positive definite square root of .
It is well-known that a random vector has a multivariate t distribution with location vector , dispersion matrix , and degrees of freedom , if its probability density function (PDF) is
where is the square of the Mahalanobis distance between and with respect to .
When in (1), we obtain the PDF of the multivariate normal vector , which is given by . In addition, if in (1), we retrieve the PDF of the multivariate Cauchy distribution.
A detailed study on the multivariate t distribution appear in Kotz and Nadarajah [26].
The PDF of the multivariate skew-t distribution can be expressed in terms of the PDF (1) as in Definition 1 (Arellano-Valle and Genton [3]).
Definition 1.
The random vector is said to have a multivariate skew-t distribution with location vector , dispersion matrix , shape parameter and degrees of freedom, denoted by , if its PDF is
where , , with ⊙ being the Hadamard product, and is the CDF of a standard t random variable with degrees of freedom.
The multivariate skew-normal distribution is obtained as a limiting case of the multivariate skew-t distribution when , with the PDF given by
where is the CDF of a standard normal random variable. We write if the PDF of is given by (3). In addition, the multivariate t distribution is obtained from the multivariate skew-t distribution when .
A stochastic representation for is , ; see Arellano-Valle and Genton [3]. From this representation we have
where is the PDF of U, , given by
Next, we define an extension of the multivariate skew-t distribution. This definition can be found in Arellano-Valle and Genton [3].
Definition 2.
The random vector is said to have a multivariate extended skew-t distribution with location parameter , dispersion matrix , shape parameter , extension parameter and degrees of freedom, denoted by , if its PDF is given by
where .
For in (6), we obtain the PDF of a multivariate skew-t distribution given in (2). If and in (6), we obtain the PDF of a multivariate t distribution given in (1). The same occurs when . If , the density degenerates to zero. The behavior of the multivariate extended skew-t distribution according to the values of the extension parameter is studied in detail in Arellano-Valle and Genton [3] where several properties about this distribution and an application have also been studied.
The conditional distribution of , , given , with being a measurable set, is called the truncated multivariate skew-t distribution. This distribution is defined via its PDF in Definition 3.
Definition 3.
Let be a measurable set. The random vector has a truncated multivariate skew-t distribution with support A and parameters , , and degrees of freedom, denoted by , if its PDF is given by
The PDF of can be expressed in equivalent form as
where , being the PDF of , , given in (2).
Note that if in (8), we obtain the multivariate skew-t distribution (Definition 1) as a particular case of the truncated multivariate skew-t distribution. If we take in (7), then we obtain the PDF of a random vector with truncated multivariate t distribution with support A and parameters , and degrees of freedom, which we denote by (Morán-Vásquez and Ferrari [8]). As expected, the truncated multivariate skew-normal distribution (Morán-Vásquez et al. [9]) is obtained as a limiting case of the truncated multivariate skew-t distribution when . Additional special cases of the truncated multivariate skew-t distribution are the multivariate t distribution ( and in (8)) and the multivariate normal distribution (, and in (8)).
Figure 1 exhibit different shapes of the truncated bivariate skew-t distribution with rectangular support according to its parameters. We consider different parameter settings and plot the surface of the PDF (7) and its contours for different levels, namely , where . The legend indicates the values of all the parameters considered in the first plot and the values of the parameters that are changed from a plot to the subsequent one (in alphabetical order). In Figure 1a, where , the contours are truncated ellipses. When , these ellipses are deformed (Figure 1b). Figure 1c shows the effect of parameter . As the degrees of freedom parameter grows, the contours of the truncated bivariate skew-t distribution tend to the contours of truncated bivariate skew-normal distribution (Figure 1c–d). The tails of the truncated bivariate skew-t distribution are heavier for smaller values of .
Figure 1.
Contour plots (at levels 0.33, 0.25, 0.14, 0.06, 0.02, 0.006, 0.002) and PDF of , where: (a) , , , , , , ; (b) , ; (c) ; (d) .
The truncated multivariate skew-t distribution is appropriate for modeling multivariate data whose values are restricted to a subset of , and they are possibly skewed and heavy-tailed. This type of data occurs in a variety of situations. For instance, the dataset studied in Flecher et al. [25] contains measurements of the relative humidity (in %) of an air–water mixture recorded by the INRA (National Institute of Agricultural Research) weather station located in Toulouse, South of France, between 1972 and 1999. These data are restricted to the interval , and the skewness is apparent (Flecher et al. [25] Figure 1). In addition, the dataset presented and discussed in Morán-Vásquez and Ferrari [7] refers to observations on vitamins B2 (in mg), B3 (in mg), B12 (in mcg), and D (in mcg) intakes based on the first 24 h dietary recall interview. These data are positively correlated, their bivariate distributions are skewed, and outliers are present (Morán-Vásquez and Ferrari [7] Figure 2). The models proposed by these authors for analyzing this dataset are associated with truncated distributions. Thus, the datasets in Flecher et al. [25] and Morán-Vásquez and Ferrari [7] are practical examples where the truncated multivariate skew-t distribution can be used.
Figure 2.
Scatter plots of simulated random samples overlaid with contour plots (at levels , , , , ) of the PDF of . For plots (a–d): , , , , , and (a) and , (b) and , (c) and , (d) and . For plots (e,f): , , , , , and (e) and , (f) and .
3. Distributional Properties
In Theorem 1, we state and prove the closure property of the truncated multivariate skew-t distribution under affine transformations.
Theorem 1.
Let be the transformation , being a constant vector and a non-singular constant matrix. If , then , where , , and , with .
Proof.
Applying the transformation in (7) with the Jacobian , we have
where . By noting , and , we obtain that
which shows that . □
Corollary 1.
Let be the transformation , being a constant vector and a non-singular constant matrix. If , then .
Proof.
Substitute in Theorem 1. □
The result stated in the above corollary can also be obtained as a particular case of Theorem 3.3 of Morán-Vásquez and Ferrari [8].
Next, we present results on marginal and conditional distributions involving sub-vectors of the random vector having truncated multivariate skew-t distribution. For this, we consider partitions of , , , and as follows
where , , , , , , , , and and are such that . In addition, we define and . The Schur complement of the block of the matrix is given by . In addition, we define . The dimension p is such that . We assume that the support set of the truncated multivariate skew-t distribution is a rectangle , which can be written as , where are finite or infinite intervals. Furthermore, R can be expressed as
where and .
Now, consider the partitions given in (9) and (10) for . Integrating (7) with respect to , we find the marginal PDF of as
where . It is noteworthy that the above PDF does not necessarily have the structure of the PDF (7). In Theorem 2, we establish conditions on the support set R for some marginals to preserve the truncated multivariate skew-t distribution. To proceed, we need the following preliminary results.
Lemma 1.
Let be the PDF given in (5). Then,
Proof.
The result follows from the representation (4). □
Lemma 2.
If , , and k is a scalar, then
Proof.
See Azzalini [27] Lemma 5.3. □
Proof.
By noting and by applying Fubini’s theorem, we have
To evaluate the integral with respect to , we use the identity and subsequently change the variable . Thus we obtain
where the last line is derived by using Lemma 2 with , , and .
Proof.
Substitute in Theorem 2. □
From Theorem 3.7 of Morán-Vásquez and Ferrari [8], we can establish that if , then the marginals of the truncated multivariate t distribution belong to the class of truncated elliptical distributions (but do not necessarily belong to the truncated multivariate t family). Corollary 2 allows us to establish closedness of the truncated multivariate t distribution for some marginals by restricting the support set without constraining the values of .
In Definition 4, we present the truncated multivariate extended skew-t distribution, which is needed to derive the conditional distribution of , where has a truncated multivariate skew-t distribution. The truncated multivariate extended skew-t distribution is obtained by conditioning (Definition 2) on , where is a measurable set.
Definition 4.
Let be a measurable set. The random vector has a truncated multivariate extended skew-t distribution with support A and parameters , , , extension parameter and degrees of freedom, denoted by , if its PDF is
where .
The PDF of can be expressed as
where , with being the PDF of , (Definition 2).
For in (15), we have the PDF of a multivariate extended skew-t distribution given in (6). If in (14), we obtain the PDF of a truncated multivariate skew-t distribution given in (7). The truncated multivariate t distribution is a limiting case of the truncated multivariate extended skew-t distribution when . The truncated multivariate extended skew-normal distribution (Morán-Vásquez et al. [9]) is obtained from (14) when . If we take in (14), we obtain a new truncated elliptical distribution with the PDF
which is also an extension of the distribution presented in Equation (2) of Arellano-Valle and Genton [3]. Furthermore, the PDF of the truncated multivariate t distribution can also be obtained from (14) when and .
In Theorem 3, we derive the conditional distribution of , when has a truncated multivariate skew-t distribution.
Theorem 3.
Proof.
Using the identities (Arellano-Valle and Genton [3] Equation (24))
and , , in the above expression we have
where . Taking into account the equalities and
in the above expression, we arrive at
where and . This completes the proof. □
Note that for in Theorem 3, the conditional distribution of has the structure of the distribution with PDF given in (16).
Proof.
Take in Theorem 3. □
4. Random Vector Generation
Simulations of truncated multivariate distributions have been widely used in various fields of statistics. This section deals with the random vector generation from the truncated multivariate skew-t distribution. Our proposal is based on the rejection sampling method, which can be extended to unnormalized PDFs avoiding the computation of normalizing constants (Maatouk and Bay [28] Prop. 1).
Let be a measurable set. Define the functions
and
Note that (17) corresponds to the kernel of the PDF of the truncated multivariate skew-t distribution. The expression (18) is the PDF of the truncated multivariate t distribution with support A, which can be obtained from (7) when . It is straightforward to show that
A study on the generation of random samples from the truncated multivariate t distribution appear in Geweke [29]. In a more general approach, Morán-Vásquez and Ferrari [7] proposed an algorithm based on Gibbs sampling for the generation of random vectors of the truncated elliptical distributions with rectangular support.
The generation of a random vector from the truncated multivariate skew-t distribution is carried out through the following steps:
- (1)
- Generate from a truncated multivariate t distribution with PDF given in (18).
- (2)
- Generate u from a uniform distribution on . If , accept ; otherwise, go back to step 1.
Figure 2 displays scatter plots of random samples of size 2000 generated from truncated bivariate skew-t distributions with rectangular supports. These scatter plots are overlaid with contours of the PDF (7) for different levels, namely , where . The random sample of Figure 2a corresponds to a truncated bivariate t distribution with heavy tails ( and small ), and that of Figure 2b corresponds to a truncated bivariate normal distribution ( and large ). Figure 2c–e show skewed and heavy-tailed random samples generated from a truncated bivariate skew-t distribution with and small . The random sample showed in Figure 2f was generated with the same parameter values as in Figure 2e, except , which is large; so we have a random sample from a truncated bivariate skew-normal distribution. It is noteworthy that the random samples generated from the truncated bivariate skew-t distributions with small exhibit outliers in relation to those generated with large (Figure 2a,b,e,f). Multivariate skewness is also evident in the random samples generated from the truncated bivariate skew-t distributions with (Figure 2c–f). The plots suggest that the methodology to generate random vectors of the truncated multivariate skew-t distribution seems to be adequate.
Figure 3 presents the histograms of the random samples from the marginal (Figure 3a) and the conditional (Figure 3b) obtained from the bivariate random sample displayed in Figure 2e. These histograms are overlaid with the marginal PDFs given in (11) and the one established according to Theorem 3, respectively. Since the support of the bivariate skew-t random sample shown in Figure 2e is , Theorem 2 is not applicable, and therefore, the random sample of does not come from a truncated univariate skew-t distribution, but it does come from a distribution with PDF given by (11). On the other hand, the random sample of given that has size 70 and comes from the truncated univariate extended skew-t distribution, . The plots presented in Figure 3 suggest a good performance of the random samples obtained from and .
Figure 3.
Histograms of simulated random samples from: (a) marginal distribution overlaid with the PDF given in (11); (b) conditional distribution overlaid with the PDF established in Theorem 3.
5. Conclusions and Future work
In this paper, we showed the property of closedness under affine transformations of random vectors having a truncated multivariate skew-t distribution. We also provided conditions for the truncated multivariate skew-t family to be preserved under marginalization. Additionally, we derived the conditional distribution of subvectors of a random vector having truncated multivariate skew-t distribution. On the other hand, we described a procedure based on the rejection sampling method to generate random vectors from the truncated multivariate skew-t distribution. This procedure can be useful for performing simulation studies on statistical models based on this distribution.
The implementation of maximum likelihood estimation for the truncated multivariate skew-t distribution may be challenging since an efficient computation of the integral involved in (7) is required. This topic will be addressed in a future paper as well as alternative methods for the generation of random vectors, simulation studies, and applications to real data. Another interesting issue is to test the hypothesis that a dataset has a truncated multivariate skew-t distribution, which may be achieved by following the ideas of Avdović and Jevremović [30] and Opheim and Roy [31]. In addition, it is interesting to study the properties derived in this paper in more general truncated distributions, such as the truncated skew-elliptical distributions, which can be defined through the skew-elliptical distributions (Azzalini [27] Ch. 6).
Author Contributions
Conceptualization, R.A.M.-V., E.Z. and D.K.N.; methodology, R.A.M.-V., E.Z. and D.K.N.; investigation, R.A.M.-V., E.Z. and D.K.N.; writing—original draft preparation, R.A.M.-V., E.Z. and D.K.N.; writing—review and editing, R.A.M.-V., E.Z. and D.K.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by Comité para el Desarrollo de la Investigación—CODI, Universidad de Antioquia (Grant No. 2018-21991).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors thank the reviewers for their helpful comments and suggestions.
Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this article.
References
- Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew normal distribution. J. R. Stat. Soc. (Stat. Methodol.) 1999, 61, 579–602. [Google Scholar] [CrossRef]
- Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. (Stat. Methodol.) 2003, 65, 367–389. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Genton, M.G. Multivariate extended skew-t distributions and related families. Metron 2010, 68, 201–234. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Azzalini, A. On the unification of families of skew-normal distributions. Scand. J. Stat. 2006, 33, 561–574. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Genton, M.G. Multivariate unified skew-elliptical distributions. Chil. J. Stat. 2010, 1, 17–33. [Google Scholar]
- Galarza, C.E.; Matos, L.A.; Castro, L.M.; Lachos, V.H. Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-t distribution. J. Multivar. Anal. 2022, 189, 104944. [Google Scholar] [CrossRef]
- Morán-Vásquez, R.A.; Ferrari, S.L.P. Box-Cox elliptical distributions with application. Metrika 2019, 82, 547–571. [Google Scholar] [CrossRef] [Green Version]
- Morán-Vásquez, R.A.; Ferrari, S.L.P. New results on truncated elliptical distributions. Commun. Math. Stat. 2021, 9, 299–313. [Google Scholar] [CrossRef]
- Morán-Vásquez, R.A.; Cataño Salazar, D.H.; Nagar, D.K. Some results on the truncated multivariate skew-normal distribution. Symmetry 2022, 14, 970. [Google Scholar] [CrossRef]
- Galarza Morales, C.E.; Matos, L.A.; Dey, D.K.; Lachos, V.H. On moments of folded and doubly truncated multivariate extended skew-normal distributions. J. Comput. Graph. Stat. 2022, 31, 455–465. [Google Scholar] [CrossRef]
- Kim, H.-J. A class of weighted multivariate elliptical models useful for robust analysis of nonnormal and bimodal data. J. Korean Stat. Soc. 2010, 39, 83–92. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Branco, M.D.; Genton, M.G. A unified view on skewed distributions arising from selections. Can. J. Stat. 2006, 34, 581–601. [Google Scholar] [CrossRef]
- Kan, R.; Robotti, C. On moments of folded and truncated multivariate normal distributions. J. Comput. Graph. Stat. 2017, 26, 930–934. [Google Scholar] [CrossRef]
- Arismendi, J.C. Multivariate truncated moments. J. Multivar. Anal. 2013, 117, 41–75. [Google Scholar] [CrossRef]
- Ho, H.J.; Lin, T.I.; Chen, H.Y.; Wang, W.L. Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 2012, 142, 25–40. [Google Scholar] [CrossRef]
- Nadarajah, S. A truncated bivariate t distribution. Econ. Qual. Control. 2007, 22, 303–313. [Google Scholar] [CrossRef]
- Horrace, W.C. Some results on the multivariate truncated normal distribution. J. Multivar. Anal. 2005, 94, 209–221. [Google Scholar] [CrossRef] [Green Version]
- Horrace, W.C. On ranking and selection from independent truncated normal distributions. J. Econom. 2005, 126, 335–354. [Google Scholar] [CrossRef] [Green Version]
- Tallis, G.M. The moment generating function of the truncated multi-normal distribution. J. R. Stat. Soc. 1961, 23, 223–229. [Google Scholar] [CrossRef]
- Tallis, G.M. Elliptical and radial truncation in normal populations. Ann. Math. Stat. 1963, 34, 940–944. [Google Scholar] [CrossRef]
- Tallis, G.M. Plane truncation in normal populations. J. R. Stat. Soc. 1965, 27, 301–307. [Google Scholar] [CrossRef]
- Birnbaum, Z.; Meyer, P.L. On the effect of truncation in some or all coordinates of a multi-normal population. J. Indian Soc. Agric. Stat. 1953, 5, 17–28. [Google Scholar]
- Marchenko, Y.V.; Genton, M.G. Multivariate log-skew-elliptical distributions with applications to precipitation data. Environmetrics 2010, 21, 318–340. [Google Scholar] [CrossRef]
- Morán-Vásquez, R.A.; Mazo-Lopera, M.A.; Ferrari, S.L.P. Quantile modeling through multivariate log-normal/independent linear regression models with application to newborn data. Biom. J. 2021, 63, 1290–1308. [Google Scholar] [CrossRef] [PubMed]
- Flecher, C.; Allard, D.; Naveau, P. Truncated skew-normal distributions: Moments, estimation by weighted moments and application to climatic data. Metron 2010, 68, 331–345. [Google Scholar] [CrossRef]
- Kotz, S.; Nadarajah, S. Multivariate t Distributions and Their Applications; Cambridge University Press: New York, NY, USA, 2004. [Google Scholar]
- Azzalini, A.; Capitanio, A. The Skew-Normal and Related Families; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Maatouk, H.; Bay, X. A new rejection sampling method for truncated multivariate Gaussian random variables restricted to convex sets. In Monte Carlo and Quasi-Monte Carlo Methods; Springer: New York, NY, USA, 2016; pp. 521–530. [Google Scholar]
- Geweke, J. Efficient simulation from the multivariate normal and Student-t distributions subject to linear constraints and the evaluation of constraint probabilities. In Proceedings of the 23rd Symposium on the Interface, Seattle, WA, USA, 22–24 April 1991; pp. 571–578. [Google Scholar]
- Avdović, A.; Jevremović, V. Quantile-zone based approach to normality testing. Mathematics 2022, 10, 1828. [Google Scholar] [CrossRef]
- Opheim, T.; Roy, A. More on the supremum statistic to test multivariate skew-normality. Computation 2021, 9, 126. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).