Abstract
In the paper, we present an extension of the truncated-exponential skew-normal (TESN) distribution. This distribution is defined as the quotient of two independent random variables whose distributions are the TESN distribution and the beta distribution with shape parameters q and 1, respectively. The resulting distribution has a more flexible coefficient of kurtosis. We studied the general probability density function (pdf) of this distribution, its survival and hazard functions, some of its properties, moments and inference by the maximum likelihood method. We carried out a simulation and applied the methodology to a real dataset.
1. Introduction
The Slash (S) distribution is a generalization of the normal model. Its stochastic representation is given by
where is independent of and .
represents the canonical Slash model and the standard normal model is obtained for . The pdf of the canonical S distribution is
where represents the pdf of the standard normal model (see Johnson et al. [1]). This distribution is characterized by having heavier tails than normal distribution, i.e., it has greater kurtosis.
Properties of the S distribution are discussed by Rogers and Tukey [2] and Mosteller and Tukey [3]. The ML parameters for location and scale in the S model are discussed in Kadafar [4]. Wang et al. [5] studied a multivariate version of the S distribution and a multivariate skew version. Gómez et al. [6] extended the S distribution using the family of univariate and multivariate elliptical distributions also was extended by using the S model in Gómez et al. [7].
Nadarajah et al. [8] proposed the idea of constructing biased distributions, motivated by Azzalini [9], including asymmetry in these. A unified approach for the construction of models of this kind is given in Ferreira and Steel [10]. If X is a symmetrical random variable around zero with pdf and cumulative distribution function (cdf) , the new random variable Y is defined with pdf given by:
with denoting a pdf in the interval . Then Y is a skew version of the variable X. The most commonly-used versions of (2) are the skew distributions proposed by Azzalini [9] in the form:
The skew-normal (SN) model is obtained as a particular case of (3) considering and , where denotes the cdf of the standard normal model.
In the present paper, we extend the TESN model introduced by Nadarajah et al. [8], based on the Slash methodology. The pdf of the TESN distribution is given by:
where . Hereafter, we use the notation to indicate that X is a random variable following a TESN distribution. According to Barreto-Souza and Simas [11], the distribution presents different behavior for a large , suggesting that this is a rich class of distributions. Furthermore, the parameter can be interpreted as a concentration parameter. Figure 1 shows the graph of the TESN pdf function with variations of the parameter .
Figure 1.
TESN pdf for different values of .
The extension of this model is based on the quotient of two independent random variables, with TESN distribution and a power of the uniform distribution , respectively, obtaining a distribution with a more flexible coefficient of kurtosis and so generating an appropriate model for fitting data. In practical terms, this generalization is based on the search for distributions that are more flexible, which may provide a “better fit” than the TESN distribution. For example, see the works by Gomes et al. [12], Maurya and Nadarajah [13] and the references therein.
The article is organized as follows. In Section 2 we study the representation, pdf, properties, and graphs. In Section 3, we present a Monte Carlo simulation experiment to evaluate the maximum likelihood estimates of the model parameters in Section 2. In Section 4, we provide an application of the proposed distribution. In Section 5, we conclude with some final comments.
2. Incorporating Kurtosis
In this section, we introduce a new extension of the TESN distribution. We studied its pdf, survival and hazard functions, moments, location and scale parameters, and their log-likelihood equations.
2.1. Representation
Following the representation of the Slash distribution, the representation of this new distribution is given by the following definition:
Definition 1.
A random variable (r.v.) Z has a Slashed Truncated Exponential Skew-Normal (STESN) distribution, denoted by if it is represented by:
where and are independent variables, , .
2.2. Probability Density Function
The following proposition shows the pdf function for the STESN distribution, generated using the stochastic representation given in (5) and using the Jacobian method for transforming the r.v.:
Proposition 1.
If the pdf of Z is given by:
where , , and .
Proof.
The pdf is generated using the representation given in (5). Using the Jacobian method for transforming the r.v. we obtain ; , calculating the Jacobian, we obtain:
replacing the joint pdf :
where , , and . Hence, marginalizing with respect to variable W, we obtain:
where , , and:
□
Proposition 2.
Let . If the r.v. Z converges in law to the r.v. .
Proof.
Let and given by (5), where . We obtain:
where if (see Lehmann [14]) here denote converges in probability. Applying Slutsky’s Lemma to , we have:
where denote converges in law, i.e., for increasing values of q, the r.v Z converges in law to a distribution. □
Remark 1.
The above proposition implies that if then the pdf of the STESN distribution approaches the pdf of a TESN distribution.
Figure 2a,b show the graphs for the pdf of this model for some parameter values.
Figure 2.
STESN for values of (a) ; (b) ; and (c) for different values of q.
2.3. Reliability Analysis
For the study of failure times, we need to consider a time y, where we have . Therefore, in our model we study the case of non-negative variables where , thus, the model must be transformed to obtain the following pdf:
where , , , and . Figure 2c shows the graphs of the pdf for different parameter values. Once the transformation is complete, we obtain the survival and hazard functions. The survival function is defined as the probability that a subject does not experience the event of interest before a moment t, and in our model is given by:
where . It also gives the hazard function defined as the probability of failure during a time interval given in our model by:
where . Figure 3a,b show the graphs of survival and hazard functions respectively, for different parameter values.
Figure 3.
(a) Survival function and (b) hazard function for log-STESN model with different combinations of values for and q.
2.4. Moments
Let Z be a r.v. where , the r-th moment for the variable is given by the following proposition.
Proposition 3.
Using the representation (5), the r-th moment of the r.v. Z is:
where and the k-th order statistic of a random variable with distribution .
Proof.
The r-th moments of Z can be calculated as:
where is the r-th moment for the model proposed by Nadarajah et al. [8] given by:
where the k-th order statistic of a random variable with distribution and the r-th moment of the r.v. .
Therefore the r-th moment for the variable is given by:
□
Using this proposition, the first four moments of the r.v. Z are given in the following corollary.
Corollary 1.
From the -th moment of the r.v. represented by (10), we obtain the first four moments of the variable, given by:
2.5. Incorporation of Parameters
To produce a more flexible distribution, we will extend this model to location and scale parameters as , where obtaining the following proposition.
Proposition 4.
2.6. Log Likelihood Equations
Let be a random sample of the r.v. X with distribution, the log-likelihood function can be written as:
where and .
For each parameter we have the following likelihood equations:
where . Furthermore, , , , and .
As can be observed, this system can only be resolved by iterative procedures such as Newton–Raphson. As an alternative, it is also possible is to use the optim routine implemented in R software [15]. Standard errors for parameters can be estimated using the hessian matrix of the log-likelihood function, which can be estimated, for instance, using the pracma package (see Borchers [16]).
2.7. STESN or TESN Model?
In order to decide between the STESN and TESN models, we can use the traditional Akaike (AIC, Akaike [17]) and Schwarz (BIC, Schwartz [18]) criteria. As the TESN model corresponds to the STESN model with , we can use the likelihood ratio test (LRT) to decide between the two models considering (TESN model) versus (STENS model). This is a problem where the null hypothesis is exactly on the boundary of the parameter space. This kind of problem was first discussed in Chernoff [19]. This problem is also presented, for instance, in a random effects model when we are interested in testing if the variance of such random effects is zero (see Stram and Lee [20] and Gallardo et al. [21]) or in a cure rate model when we are interested in testing the presence of cured individuals in the population (see Maller and Zhou [22]). In this particular case, the statistic for the LRT, say , does not converge asymptotically to the usual distribution, i.e., the chi-squared distribution with 1 degree of freedom, but converges to , i.e., a 50-50 mixture between a point mass in 0 and distribution.
3. Simulation Study
In this section, we will study the behavior of ML estimators in finite samples, verifying empirically whether these estimators have desirable properties (unbiased, asymptotically efficient, verification of the normal asymptotic distribution of ML estimators).
The random variables of the TESN distribution and the Beta distribution were generated to obtain our new variable with pdf shown in (15). The initial values used for optimization were obtained by a sequence of values which maximized this function. In this sequence, takes values between and 3, q between 1 and 5, between and 2, and between 2 and 10. This process was repeated 5000 times with sample size , , and 200 for different combinations of parameters. Table 1 presents the empirical bias, the standard errors (SE), root of the mean squared error (RMSE), and 95% coverage probabilities (CP) for the estimators of the parameters of the STESN distribution with different combinations of parameters and sample sizes. From those tables, notice that the biases SE and RMSE decrease as the sample size increases, suggesting that estimators are consistent. Furthermore, the asymptotic confidence intervals have an empirical CP differing from the nominal values, especially when the sample size is small. However, we observe that the asymptotic confidence intervals converge to the nominal values when the sample size is increased. Figure 4 shows the estimated pdf for the ML estimators of , and q for two combinations of parameters, showing graphically that the skewness of the estimators disappears progressively when the sample sizes increases. We also note that the distribution of the estimators for and q are more asymmetric than the distribution of the estimators for and , especially in small sample sizes.
Table 1.
Empirical bias, SE, RMSE, and 95% CP for the ML estimators of , , , and q with different combinations of parameters and sample sizes.
Figure 4.
Estimated pdf for the ML estimators of , and q in the TESN distribution for: (upper panels), and (lower panels).
4. Application to a Data Set
In this section, we will present a real data application to illustrate the STESN model compared with other models discussed in the literature. These data were presented by Barlow et al. [23] and represent the fatigue fracture life of Kevlar 373/epoxi subjected to a constant pressure of stress until they all fail. To obtain the parameter estimations, the optim command was used and its estimation errors were calculated by the Hessian matrix, both in R software. Codes are available as supplementary material.
Table 2 shows a summary of the dataset, including the sample size n, the mean , the standard deviation S, the asymmetry coefficient , the kurtosis coefficient , the minimum , and the maximum . A high kurtosis value is observed.
Table 2.
Descriptive statistics for kevlar dataset.
Table 3 shows the results of the fit; the TESN distribution was compared with the STESN distribution by AIC and BIC criteria. It is concluded that the distribution which achieves the best fit for this dataset is the STESN distribution, since it presents a lower value in the criteria. Furthermore, Table 3 provides the Kolmogorov–Smirnov statistic (KSS), a formal goodness-of-fit test to verify which distribution gives a better fit for these data. Small values of this statistic suggests a better fit. Thus, according to the Kolmogorov–Smirnov test, the STESN model fits the current data better than the TESN model.
Table 3.
Estimated parameters and standard errors (in parentheses), log-likelihood, AIC and BIC values, and KSS with p-values for TESN and STESN models in kevlar dataset.
Figure 5a,b present a histogram of the data with the densities fitted for the data set and Figure 6a,b present the QQ-plot of the densities fitted for the dataset, showing the good fit given by the new distribution.
Figure 5.
Estimated pdf for STESN and TESN for kevlar data set (a) and a zoom for the right tail (b).
Figure 6.
QQ-plot for the (a) TESN and (b) STESN distributions for the dataset.
In our problem, the observed statistic for the LRT to decide between the TESN and STESN models, discussed in Section 2.7, is with an associated p-value . Therefore, the is rejected under any usual level of significance and the STESN model is preferred over the TESN model.
5. Final Comments
In this paper, we introduced an extension of the TESN distribution from which we obtained a distribution that showed greater flexibility in the coefficient of kurtosis. Some mathematical properties of the new distribution were studied. Note that the formulae derived easily implemented in different softwares. Inference was implemented based on the ML approach, and its performance was assessed by Monte Carlo simulations using R software. An application to a real dataset showed that the new model produced a better fit than the TESN model. This application demonstrated the practical importance of the new model, and also showed the advantage of STESN over TESN. We hope this new distribution may attract wider applications.
Supplementary Materials
The following are available online at https://www.mdpi.com/article/10.3390/math9161894/s1.
Author Contributions
Conceptualization, P.A.R. and D.I.G.; methodology, H.W.G.; software, D.I.G.; validation, P.A.R., D.I.G., and H.W.G.; formal analysis, O.V.; investigation, O.V. and M.B.; writing—original draft preparation, P.A.R.; writing—review and editing, O.V. and M.B.; funding acquisition, M.B. All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding
The research of Pilar A. Rivera and Héctor W. Gómez was supported by SEMILLERO UA-2021 (Chile). The research of O. Venegas was supported by Vicerrectoría de Investigación y Postgrado of the Universidad Católica de Temuco, Projecto interno FEQUIP 2019-INRN-03.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used in Section 4 can be obtained from the corresponding reference.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 1995. [Google Scholar]
- Rogers, W.H.; Tukey, J.W. Understanding some long-tailed symmetrical distributions. Stat. Neerl. 1972, 26, 211–226. [Google Scholar] [CrossRef]
- Mosteller, F.; Tukey, J.W. Data Analysis and Regression: A Second Course in Statistics; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
- Kadafar, K. A biweight approach to the one-sample problem. J. Am. Stat. Assoc. 1982, 77, 416–424. [Google Scholar] [CrossRef]
- Wang, J.; Boyer, J.; Genton, M.G. A skew-symmetric representation of multivariate distributions. Stat. Sin. 2004, 14, 1259–1270. [Google Scholar]
- Gómez, H.W.; Quintana, F.A.; Torres, F.J. New family of slash-distributions with elliptical contours. Stat. Probab. Lett. 2007, 77, 717–725. [Google Scholar] [CrossRef]
- Gómez, H.W.; Olivares-Pacheco, J.F.; Bolfarine, H. An extension of the generalized Birnbaum-Saunders distribution. Stat. Probab. Lett. 2009, 79, 331–338. [Google Scholar] [CrossRef]
- Nadarajah, S.; Nassiri, V.; Mohammadpour, A. Truncated-exponential skew-symmetric distributions. Statistics 2014, 48, 872–895. [Google Scholar] [CrossRef]
- Azzalini, A. A Class of Distributions Which Includes the Normal Ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Ferreira, J.T.A.S.; Steel, M.F.J. A constructive representation of univariate skewed distributions. J. Am. Stat. Assoc. 2006, 101, 823–829. [Google Scholar] [CrossRef] [Green Version]
- Barreto-Souza, W.; Simas, A.B. The exp- G family of probability distributions. Braz. J. Probab. Stat. 2013, 27, 84–109. [Google Scholar] [CrossRef]
- Gomes, A.E.; Da-Silva, C.Q.; Cordeiro, G.M. The Exponentiated G Poisson Model. Commun. Stat.-Theory Methods 2015, 44, 4217–4240. [Google Scholar] [CrossRef]
- Maurya, S.K.; Nadarajah, S. Poisson Generated Family of Distributions: A Review. Sankhya B 2020, 1–57. [Google Scholar] [CrossRef]
- Lehmann, E.L. Elements of Large-Sample Theory; Springer: New York, NY, USA, 1999. [Google Scholar]
- R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- Borchers, H.W. Pracma: Practical Numerical Math Functions, R package version 2.3.3; 2021. Available online: https://CRAN.R-project.org/package=pracma (accessed on 24 June 2021).
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Chernoff, H. On the distribution of the likelihood ratio. Ann. Stat. 1954, 54, 573–578. [Google Scholar] [CrossRef]
- Stram, D.O.; Lee, J.W. Variance components testing in the longitudinal mixed effects model. Biometrics 1994, 50, 1171–1177. [Google Scholar] [CrossRef] [PubMed]
- Gallardo, D.I.; Bolfarine, H.; Pedroso-de-Lima, A.C. A clustering cure rate model with application to a sealant study. J. Appl. Stat. 2017, 44, 2949–2962. [Google Scholar] [CrossRef]
- Maller, R.; Zhou, S. Testing for the Presence of Immune or Cured Individuals in Censored Survival Data. Biometrics 1995, 51, 1197–1205. [Google Scholar] [CrossRef] [PubMed]
- Barlow, R.E.; Toland, R.H.; Freeman, T. A Bayesian analysis of stress rupture life of Kevlar 49/epoxy spherical pressure vessels. In Procedings Conference on Applications of Statistics; Marcel Dekker: New York, NY, USA, 1984. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).