Abstract
In this work, we present a new generalization of the student’s t distribution. The new distribution is obtained by the quotient of two independent random variables. This quotient consists of a standard Normal distribution divided by the power of a chi square distribution divided by its degrees of freedom. Thus, the new symmetric distribution has heavier tails than the student’s t distribution and extensions of the slash distribution. We develop a procedure to use quantile regression where the response variable or the residuals have high kurtosis. We give the density function expressed by an integral, we obtain some important properties and some useful procedures for making inference, such as moment and maximum likelihood estimators. By way of illustration, we carry out two applications using real data, in the first we provide maximum likelihood estimates for the parameters of the generalized student’s t distribution, student’s t, the extended slash distribution, the modified slash distribution, the slash distribution generalized student’s t test, and the double slash distribution, in the second we perform quantile regression to fit a model where the response variable presents a high kurtosis.
1. Introduction
The slash distribution is the result of the quotient of two independent random variables, one with a standard normal distribution and the other with a uniform distribution on the interval (0, 1), with the following stochastic representation
where is the location parameter and is the scale parameter and q is the parameter related to kurtosis. Will be denoted by and its density function has the following expression
where is the gamma function and is the gamma function incomplete. This distribution presents heavier tails than the normal distribution, that is, it has more kurtosis. Properties of this family are discussed in Rogers and Tukey [1] and Mosteller and Tukey [2].
Maximum likelihood estimators for location and scale parameters are discussed in Kafadar [3]. Wang and Genton [4] described multivariate symmetrical and skew-multivariate extensions of the slash-distribution while Gómez et al. [5] (and Erratum in Gómez and Venegas, 2008) extend the slash distribution by introducing the slash-elliptical family; asymmetric version of this family is discussed in work of Arslan [6]. Genc [7] discussed a symmetric generalization of the slash distribution. More recently, Gómez et al. [8] utilize the slash-elliptical family to extend the Birnbaum–Saunders distribution.
In (1), and , we retrieve the standard slash distribution. What is more we obtain the canonical slash distribution. When q tends to infinity, the standard normal distribution is recovered.
When , in (1), the distribution obtained is called modified slash distribution studied by Reyes et al. [9]. Whose function of density is given by
and will be denoted by , where q is kurtosis parameter.
When and , in (1), the distribution obtained is called extended slash (ES) distribution studied by Rojas et al. [10]. Whose function of density is given by
is denoted as with , , , and denotes the pdf of the standard normal distribution (see Johnson et al. [11]) and denotes the beta function.
We will say that X has a student’s t distribution with degrees of freedom and with location parameter and scale parameter , which we will denote by and you have a stochastic representation given by
and continuous probability density function is given by
with support on .
The moment’s order r of the random variable X with student’s t distribution can be explained by the function Gamma. If then
where for r even, then
,
, .
If then
Rui Li-Saralees Nadarajah [12] makes a review of all the generalizations of the student’s t distribution published to date, where they show that the main motivation of these extensions is to model heavy tails or data with high kurtosis.
In the study of symmetric distributions with heavy tails El-Bassiouny et al. [13] present the generalized student’s slash t distribution. We will say that , with parameter , has pdf given by
where q is kurtosis parameter and denotes the beta function.
Another recent extension of the slash model was proposed by El-Morshedy, A. H. et al. [14]. These authors introduced the double slash (DSL) distribution with density function given by
with , , and .
When and , in (1), the distribution generalized modified slash distribution, denoted , studied by Reyes, J., Barranco-Chamorro, I., and Gómez, H. W. [15]. Whose function of density is given by
where , , and
is the confluent hypergeometric function of the second kind. Details about this function can be seen in Abramowitz and Stegun, p. 505.
With the motivation of finding a distribution that is a generalization of the student’s t distribution and that presents heavier tails than the distributions found so far in the literature, in this article, we introduce a new generalization of the student’s t distribution (GT) whose stochastic representation is given by
where , are independent with and and we will denote it as .
The paper is organized as follows. In Section 2 the probability density function (pdf) is given and some properties of the distribution are presented and shows that the distribution student’s t is a particular case of the distribution . Additionally, moments of order r are obtained, including the kurtosis coefficient. In Section 3 derivation of the moment and maximum likelihood estimators are discussed. A simulation study is presented to illustrate the behavior of the estimator of the parameters , , and q, for . Section 4 results of using the proposed model in two real applications are reported. Section 5 presents quantile regression. Section 6 presents the main conclusions.
2. The Generalized Student’s t Distribution
We present the generalized student’s t distribution with heavier tails compared to similar distributions. Initially we will present its density function.
2.1. Density Function
We will use the stochastic representation
where W is distributed standard normal, V is distributed chi square, with degrees of freedom, W and V are independent random variables, , are location and scale parameters, respectively, degrees of freedom and is the parameter related to the distribution kurtosis.
We use the notation , and for the standard case, we denote .
Proposition 1.
Let . Then, the pdf of Y is given by
Proof.
Since W and V are two independent random variables, such that and , then the joint pdf of is
where and . By marginalizing the result follows immediately para . Doing the other expression is obtained. □
Corollary 1.
If in (14), then la fdp de Y is called the canonical generalized student’s t distribution.
where , it is called the second-class hypergeometric confluent function.
Proof.
Making and and making the change of variables and applying the result obtained in Reyes et al. [9]
where the result is obtained. □
Figure 1 on the left shows the PDFs of the generalized student’s t distribution for q = 1 compared to the Student’s t for , the normal distribution, the generalized bar t distribution and the double bar distribution. In which, it can be seen that as the variable tends to ∞ to the right (or to the left), the new model captures more data than the other comparative distributions. Furthermore, it is observed that to the extent that q is smaller, the distribution has greater kurtosis.
Figure 1.
Generalized student’s pdf with (solid line), student’s for pdf (dotted line), Normal pdf (dashed line), (dashed and dotted line) and (thick dashed line) (left), and tails comparison (right).
2.2. Tails Comparison of GT and Student’s t Distributions
In this part, we perform a comparison of the upper tails between the distribution and student’s t distribution. For this, we consider the canonical version () of distribution considering student’s t distribution with degrees of freedom. Table 1 shows for different values of y in the mentioned distributions. The distribution has tails much heavier than the student’s t distribution.
Table 1.
Tails comparison GT distributions and student’s t distribution.
Remark 1.
Table 1 illustrates the fact that the generalized student’s t distributions have heavier tails than the tails of the student’s t distribution.
2.3. Compared GT Quantiles with T Quantiles
Figure 2 shows the quantile function of the generalized student’s t distribution compared to quantile function of student’s t for different values of q and .
Figure 2.
Quantile function of the generalized student’s t distribution compared to quantile function of the student’s t for for (left) and (right).
Proposition 2.
Let . Then an approximation of quantile p of Y is
where and denotes the quantiles p of student’s t and chi-square distribution whit ν degrees of freedom.
Proof.
.
.
. □
Figure 3 shows the quantiles of the generalized student’s t distribution compared to quantile of proposition 2 for values and .
Figure 3.
Densidad de evaluate in quantile theoretical compared to quantile, proposition 2 (upper), and qqplot (under).
Properties:
- If then ;
- if then where is the quantile p of standard normal distribution.
In Table 2 we present quantiles generalized student’s t for n degrees of freedom and q = 1.
Table 2.
Table of quantiles generalized student’s t for degrees of freedom and .
2.4. Properties of the Generalized Student’s t Distribution
In this section, we present some properties of the generalized student’s t distribution.
Proposition 3.
Let then
- 1.
- .
- 2.
- If and then .
- 3.
- If , then, .
Proof.
- Making q tend to infinity in representation (13), the result is immediately obtained;
- . where es la fdp chi-square distribution with degrees of freedom. The result follows using transformation and direct integral computations;
- Making we obtain the density student’s with degrees of freedom.
□
Remark 2.
Proposition 3 shows first that the generalized student’s t distribution contains the normal distribution as a special case (). Moreover, it also shows that the generalized student’s t distribution is a scale mixture between the normal and the chi-square distribution with ν degrees of freedom. The third property shows that for , the density function for the generalized student’s t coincides with the density function of the student’s t distribution with ν degrees of freedom.
2.5. Moments
In this subsection the moments of the generalized student’s t distribution are deduced.
Proposition 4.
Let and . Hence, for and , we have that
and
Proof.
Moreover, since and are even moments for the standard normal distribution, the second result follows directly by applying the formula to the stochastic representation (13). □
Corollary 2.
Let , and hence,
Proposition 5.
Let , so that the coefficient of skewness and kurtosis are:
and
Proof.
The standardized coefficient of skewness and kurtosis are
and
and the result follows after replacing the even moments derived in Proposition 4. □
Figure 4 shows the kurtosis the distribution compared with T distribution for different values of q and .
Figure 4.
Kurtosis of the distribution compared with T distribution for .
It can be seen that the generalized student’s distribution has a greater kurtosis than the student’s distribution for q less than 2, then for data with high kurtosis, it would be recommended to use the generalized student’s distribution.
3. Inference
3.1. Moment Estimators
In the following proposition we present the moment estimators of , , and q for .
Proposition 6.
Where a random sample from the distribution of the random variable , so that the moment estimators of for are given by
where , S and are the mean, standard deviation, and sample kurtosis coefficient.
3.2. Maximum Likelihood Estimation
Given a random sample , for , the log-likelihood function can be written as
where and hence the maximum likelihood equations are given by
where, , , . . The expressions for , , and should be given,
where .
Using numerical procedures Equations (27)–(30) can be solved.
Proposition 7.
Let a random sample from the distribution of random variable . Then,
Proof.
The random variable Z and T
then
replacing the result is obtained. □
Proposition 8.
Let a random sample from the distribution of random variable . Then, a level confidence interval for the population mean is
where is the percentile of order of GT distribution.
Proof.
The result is obtained from the previous proposition. □
3.3. Simulation Study
To generate random numbers from the distribution we will use the stochastic representation given in (13) and the following algorithm:
- Simulate
- Simulate
- Compute
It then follows that .
Table 3 shows the parameter estimates obtained by the maximum likelihood method (MLE) through 1000 replicates of sizes 50, 100, 150, and 200 with their corresponding standard errors, mean length of the interval, and empirical coverage.
Table 3.
Simulation of 1000 iterations of the model .
4. Two Illustrative Datasets
Illustrative Datasets 1
We consider the data that were first presented in Jander [16], from an entomology experiment. with respect to ants. A total of ants were individually placed in the center of an arena. The measurements correspond to the initial direction in which they moved relative to a visual stimulus in a 180 degree angle from zero direction, rounded to the nearest 10 grades. Figure 5 depicts the histogram of these data, including estimated densities under a T, , , , and model, using maximum likelihood. Figure 6 shows the qqplots for T, , and models. We use the AIC (Akaike Information Criterion), which penalizes the maximized likelihood function by the excess of model parameters (AIC = −2log(lik) + 2k, where k is the number of unknown parameters being estimated, see Akaike [17]). Table 4 shows the descriptive statistics of the database, while Table 5 presents the Kolmogorov -Smirnov (KSS) statistic, corresponding values for the four given models, which also indicates that the best fit is presented by the model. Table 6 shows a 95% confidence interval for the population mean using generalized Student’s t-quantiles. Moreover, Figure 7 depicts the empirical cumulative distribution function (cdf) and the estimated cdfs for T, , and models.
Figure 5.
Histogram (left) and Comparison the tails (right) for ants dataset. Overlaid on top is the generalized student’s t density with parameters estimated via ML (solid line), the modified slash density (dashed line), the extended slash density (dotted line), the student’s t density (dashed line).
Figure 6.
Q-q plots: student’s t (a), modified slash (b), extended slash (c), generalized student’s t (d).
Table 4.
Descriptive statistics the for dataset.
Table 5.
Parameter estimates, AIC and KSS values for T, , , and models for the ants dataset.
Table 6.
The 95 percent confidence interval for the mean of dataset using T and quantiles T.
Figure 7.
Empirical cdf with estimated T c.d.f. (yellow color),estimated cdf (red color), estimated c.d.f. (green color), and estimated c.d.f. (blue color).
The estimators of moments for the dataset are:
- ;
- ;
- ;
- ,
which will be used as starting points in obtaining the EMVs.
Figure 8 depicts the histogram of these data, including estimated densities under a , and model, using maximum likelihood. We use the Akaike information criterion (AIC) and Bayesian Information Criterion (BIC), see Schwarz [18], which is defined as (BIC = , where k is the number of estimated parameters and n is the sample size. Table 7 shows these results.
Figure 8.
Histogram (left) and comparison the tails (right) for ants dataset. Overlaid on top is the generalized student’s t density with parameters estimated via ML (solid line), the modified slash density (dashed line), the extended slash density (dotted line),the student’s t density (dashed line).
Table 7.
Parameter estimates, AIC and BIC values for , and models for the ants dataset.
5. Quantile Regression
The quantile regression is used when the study objective focuses on the estimation of the different percentiles (such as the median) of a population of interest. An advantage of using quantile regression to estimate the median, rather than ordinary least squares regression current file (to estimate the mean), is that the quantile regression will be more robust in the presence of outliers. Quantile regression can be seen as a natural analogue in regression analysis when using different measures of central tendency and dispersion, in order to obtain a more complete and robust analysis of the data. Another advantage of this type of regression lies in the possibility of estimating any quantile, thus being able to assess what happens with extreme values of the population.
5.1. Quantile Regression Uni-Dimensional
Translating this concept of quantile to the regression line, we obtain the linear quantile regression.
If we assume that
with and that the conditional expected value is not necessarily zero, but the -ésimo quantile of the error with respect to the regressive variable is zero , then the -ésimo quantile of with respect to X can be written as
The estimates of y are found by
being y .
To estimate the parameters, the function described in the equation should be minimized. For this, there is a way to approach the minimization problem as a linear programming problem. This allows us to obtain the regression line for the value of a certain quantile. Therefore, the first of the limitations will be solved raised at the end of the previous section, for simple linear regression. Furthermore, since the quartiles have robust properties, it is also possible to solve the second of the limitations that arose with the classical regression line.
5.2. Quantile Regression Student’s t
In this case, in the regression equation
the response variable , it is possible to generate random numbers for the distribution, which the parameters , and they are estimated using maximum likelihood for the data. Then, one way to obtain the quantiles of Y is using the stochastic representation.
- Simulate ;
- Simulate
- Compute
Using this new variable quantile regression is applied to the data .
5.3. Quantile Regression Slash Logistic
In this case, in the regression equation
the response variable , it is possible to generate random numbers for the distribution, which the parameters , , and q they are estimated using maximum likelihood for the data. Then, one way to obtain the quantiles of Y is using the stochastic representation.
- Simulate ;
- Compute ;
- Simulate ;
- Compute .
Using this new variable quantile regression is applied to the data .
5.4. Quantile Regression Generalized Student’s t
In this case, in the regression equation
the response variable , it is possible to generate random numbers for the distribution, which the parameters , , , and q they are estimated using maximum likelihood for the data. Then, one way to obtain the quantiles of Y is using the stochastic representation given in (13)
- Simulate ;
- Simulate
- Compute
Using this new variable quantile regression is applied to the data .
5.5. Application 2
We consider now data concerning the body mass index and Lean Body Mass of 202 Australian athletes. The data are available for download at http://azzalini.stat.unipd.it/SN/index.html (accessed on 15 October 2021). Table 8 shows statistics for these data for which the maximum likelihood estimators of (, ) and its corresponding coefficients AIC and BIC fit models for data. are shown in Table 9 and Table 10, respectively.
Table 8.
Summary statistics for dataset of the body mass index and Lean Body Mass of 202 Australian athletes.
Table 9.
Coefficients AIC and BIC fit models for dataset of the body mass index and Lean Body Mass of 202 Australian athletes for quantile regression student’s t (T), quantile regression slash logistic (SLOG) and quantile regression generalized student’s t (GT).
Table 10.
Parameter estimates and standard deviation values for quantile regression coefficients 50 student’s t (T) and generalized student’s t () models for the dataset.
In Figure 9 the quantile regression of the data is shown using the T, and models.
Figure 9.
Quantile regression for BMI and LBM data with student’s t distribution (left), slash logistic distribution (center) and generalized student’s t distribution (right).
6. Discussion
We have introduced a new distribution called the generalized student’s t distribution (GT). The main idea is to replace the exponent of the chi-square distribution by a exponent where is the kurtosis parameter. We consider the density function of the distribution and study some of its properties, as well as its moments. The parameter estimation was analyzed using the method of moments and maximum likelihood estimation. We present two illustrations, in the first a set of real data are studied where we show that the GT distribution fits the data better than the T, ES, , , and distributions. In the other application, we use quantile regression to fit a linear model to a paired dataset where the response variable shows high kurtosis where it is shown that the distribution fits better than the T and distributions to model the residuals.
Author Contributions
Data curation, J.R., M.A.R. and J.A.; formal analysis, J.R., M.A.R. and J.A.; investigation, J.R., M.A.R. and J.A.; methodology, J.R., M.A.R. and J.A.; writing—original draft, J.R., M.A.R. and J.A.; writing—review and editing, M.A.R. and J.A.; Funding Acquisition, J.R., M.A.R. and J.A. All authors have read and agreed to the published version of the manuscript.
Funding
Research of J.R., M.R. and J.A. was supported by Universidad de Antofagasta through project SEMILLERO UA 2021.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Acknowledgments
The authors would like to thank the referee for his/her constructive suggestions that improved the final version of this paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Rogers, W.H.; Tukey, J.W. Understanding Some Long-Tailed Symmetrical Distributions. Stat. Neerl. 1972, 26, 211–226. [Google Scholar] [CrossRef]
- Mosteller, F.; Tukey, J.W. Data Analysis and Regression; Addison-Wesley: Boston, MA, USA, 1977. [Google Scholar]
- Kafadar, K.A. Biweight Approach to the One-Sample Problem. J. Am. Stat. Assoc. 1982, 77, 416–424. [Google Scholar] [CrossRef]
- Wang, J.; Genton, M.G. The multivariate skew-slash distribution. J. Stat. Plan. Inference 2006, 136, 209–220. [Google Scholar] [CrossRef]
- Gómez, H.W.; Quintana, F.A.; Torres, F.J. A New Family of Slash-Distributions with Elliptical Contours. Stat. Probab. Lett. 2008, 77, 717–725, Erratum in Gómez, H.W.; Venegas, O. Stat. Probab. Lett. 2008, 78, 2273–2274. [Google Scholar] [CrossRef]
- Arslan, O. An Alternative Multivariate Skew-Slash Distribution. Stat. Probab. Lett. 2008, 78, 2756–2761. [Google Scholar] [CrossRef]
- Genc, A.I. A Generalization of the Univariate Slash by a Scale-Mixture Exponential Power Distribution. Commun. Stat. Simul. Comput. 2007, 36, 937–947. [Google Scholar] [CrossRef]
- Gómez, H.W.; Olivares-Pacheco, J.F.; Bolfarine, H. An Extension of the Generalized Birnbaum-Saunders Distribution. Stat. Probab. Lett. 2009, 79, 331–338. [Google Scholar] [CrossRef]
- Reyes, J.; Gómez, H.W.; Bolfarine, H. Modified slash distribution. Statistics 2013, 47, 929–941. [Google Scholar] [CrossRef]
- Rojas, M.A.; Bolfarine, H.; Gómez, H.W. An extension of the slash-elliptical distribution. Stat. Oper. Res. Trans. (SORT) 2014, 38, 215–230. [Google Scholar]
- Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1988. [Google Scholar]
- Li, R.; Nadarajah, S. A review of Student’s t distribution and its generalizations. Empir. Econ. 2020, 58, 1461–1490. [Google Scholar] [CrossRef] [Green Version]
- El-Bassiouny, A.H.; El-Morshedy, M. The Univarite and Multivariate Generalized Slash Student Distribution. Int. J. Math. Its Appl. 2015, 3, 3547. [Google Scholar]
- El-Morshedy, M.; EL-Bassiouny, A.H.; Tahir, M.H.; Eliwa, M.S. Univariate and Multivariate Double Slash Distribution. J. Stat. Appl. Probab. 2020, 9, 459–471. [Google Scholar]
- Reyes, J.; Barranco-Chamorro, I.; Gómez, H.W. Generalized modified slash distribution with applications. Commun. Stat.-Theory Methods 2020, 49, 2025–2048. [Google Scholar] [CrossRef]
- Jander, R. Die Optische Richtungsorientierung der RotenWaldameise (Formica rufa L.). Z. Vgl. Physiol. 1957, 40, 162–238. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G.E. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).