Abstract
In this paper, we introduce the skew-symmetric generalized normal and the skew-symmetric generalized t distributions, which are skewed extensions of symmetric special cases of generalized skew-normal and generalized skew-t distributions, respectively. We derive key distributional properties for these new distributions, including a recurrence relation and an explicit form for the cumulative distribution function (cdf) of the skew-symmetric generalized t distribution. Numerical examples including a simulation study and a real data analysis are presented to illustrate the practical applicability of these distributions.
Keywords:
skew-normal distribution; skew-t distribution; generalized skew-normal distribution; generalized skew-t distribution; skew-symmetric distributions; recurrence relations MSC:
62E10
1. Introduction
Azzalini [1] introduced the skew-normal distribution characterized by the following density function:
where represents the normal density function, and denotes the standard normal cumulative distribution function. The distribution has gained considerable attention due to its ability to capture asymmetry in data while preserving key characteristics of the normal distribution. Its flexibility has made it particularly useful in various fields, such as finance, environmental studies, and biomedical research.
Subsequently, Jamalizadeh et al. [2] proposed a two-parameter generalized SN distribution with the following density function:
where and are real numbers that enhance the model’s flexibility in capturing asymmetric data distributions. This two-parameter model effectively accommodates a wider range of skewness and kurtosis, offering more flexibility compared to its one-parameter counterpart.
Building on this, Jamalizadeh and Balakrishnan [3] introduced a three-parameter distribution , which can be viewed as a special case of the unified multivariate skew-normal distribution introduced by Arellano-Valle and Azzalini [4]. The density function of is defined as follows:
where represents the cumulative distribution function of the standard bivariate normal distribution with correlation (with ). This three-parameter model enhances the distribution’s capability to provide a more flexible fit for complex datasets and to accommodate dependencies between variables.
Remark 1.
In the special case where , the density function in (1) simplifies to the generalized normal distribution , given by
where
defines the normalization constant.
This distribution represents a symmetric distribution centered at zero as depicted in Figure 1. The capability of this distribution to retain symmetry while introducing elements of skewness makes it particularly valuable for statistical modeling applications.
Figure 1.
The density function of for .
Definition 1.
The family of skew-symmetric (-modulated) distributions is defined by the following density function [5]:
where is a symmetric density function (symmetric about zero), is an odd function, and is a distribution function such that .
This definition highlights the interplay between symmetry and skewness, enabling nuanced modeling of real-world phenomena. Azzalini and Regoli [6] explored various properties of skew-symmetric (-modulated) distributions, contributing significantly to the theoretical framework essential for practical applications. Several studies have investigated skew-symmetric distributions, including that of Nadarajah and Kotz [7], which introduced a family of skew-symmetric normal distributions characterized by the density function , where is a real constant and is an absolutely continuous distribution function with a symmetric density. By utilizing distribution functions such as normal, Student’s t, Laplace, logistic, and uniform distributions for , the authors demonstrated the versatility of skew-symmetric models across different contexts.
Gupta and Chang [8] examined a class of multivariate skew distributions, emphasizing the importance of skewness in multivariate data analysis. Meanwhile, Gomez et al. [9] studied a general family of skew-symmetric distributions generated by the normal distribution’s cumulative distribution function, further expanding the theoretical landscape of these distributions. Additionally, Nekoukhou and Alamatsaz [10] introduced a family of skew-symmetric Laplace distributions, which have practical applications in fields such as finance and risk management. Salehi and Azzalini [11] considered a Kotz-type distribution, where the tail weight and degree of peakedness is regulated by two parameters instead of a single one, and with a built symmetry-modulated Kotz-type distribution. They made statistical inference based on the likelihood function on three real data sets.
In this paper, we aim to introduce a three-parameter skew-symmetric generalized normal, and a four-parameter skew-symmetric generalized t distributions as two new flexible models with wider ranges of skewness. The remainder of this paper is structured as follows: Section 2 presents the skew-symmetric generalized normal distribution and discusses its key properties. Section 3 then introduces the skew-symmetric generalized t distribution, providing a recurrence relation and an explicit form for its cumulative distribution function (cdf). Section 4 offers numerical examples, including a simulation study and an analysis of real data. Finally, the paper concludes in Section 5.
2. Skew-Symmetric Generalized Normal Distribution
The three-parameter skew-symmetric generalized normal distribution, denoted as , is derived by substituting the symmetric density function from (2) into (4). In this formulation, we utilize the standard normal distribution function, represented as , and define the weighting function . This approach allows us to capture the skewness and symmetry properties inherent in the distribution.
The density function for the is expressed mathematically as follows:
where , , and () are shape parameters, and is a normalization constant defined in (3). This formulation highlights the interplay between the parameters , , and , which together characterize the shape and behavior of the distribution.
In cases where , the density function of the simplifies significantly, leading to the following expression:
This simplification exposes the core structure of the distribution in the absence of the correlation parameter, facilitating a clearer analysis of the effects and roles of the remaining parameters.
The graphical representation of the density function of for various parameter values is illustrated in Figure 2. These plots provide valuable insights into how the parameters , , and influence the shape and characteristics of the . By examining these plots, one can observe the effects of skewness and kurtosis, which are critical in understanding the distribution’s behavior in practical applications.
Figure 2.
The density function of for some choices of the parameters.
Overall, the serves as a versatile model in statistical analysis, accommodating a range of data characteristics through its parameterization, and the visualizations further enhance our comprehension of its properties.
Remark 2.
The following results are readily obtained:
- 1.
- 2.
- 3.
- 4.
- (Thus, is not identifiable.)
- 5.
- If , then
- 6.
- If , then , where , , and .
Moments
In this section, we analyze the skewness and kurtosis of the three-parameter distribution. To facilitate this analysis, we first derive the moment-generating function (MGF) of the .
Theorem 1.
The moment-generating function of is given by
where
Proof.
To derive the moment-generating function, we start with the integral representation of the MGF:
where follows a bivariate normal distribution , which is independent of and Z, where Z is independently and identically distributed as . □
The derivatives of the moment-generating function, evaluated at , provide the moments of the . To aid in this process, we present the following lemma.
Lemma 1.
Let defined as and . Let denote a positive definite covariance matrix. Furthermore, we assume that for , , and Σ are partitioned as follows:
then, for we have [12]
where and .
The first four moments of are expressed as follows:
where , , and
The plots illustrating the skewness and kurtosis of for various parameter values are presented in Figure 3 and Figure 4, respectively.
Figure 3.
The skewness of for the selected parameter values.
Figure 4.
The kurtosis of for the selected parameter values.
As shown in Figure 3, the skewness of the increases with higher values of and , indicating a greater asymmetry in the distribution. Specifically, the maximum skewness occurs at , resulting in a value of . In contrast, Figure 4 illustrates that the kurtosis initially decreases as increases, before rising again. The peak kurtosis value is observed at for . This behavior highlights the capacity of to model data with varying levels of asymmetry and peakedness, providing a flexible framework for statistical analysis.
3. Skew-Symmetric Generalized t Distribution
Jamalizadeh and Balakrishnan [3] defined a four-parameter generalized skew-t distribution, , with the following density function:
where , is the density function of the t distribution with degrees of freedom, and represents the distribution function of the standard bivariate t distribution with correlation (where ) and degrees of freedom.
Remark 3.
This is a symmetric distribution, centered at 0, as illustrated in Figure 5.
Figure 5.
The density function of for .
The four-parameter skew-symmetric generalized t distribution, , is obtained by substituting (17) into (4) as a symmetric density function , using the standard normal distribution function and . The density function of is given by
where , , () are shape parameters, is the tail parameter, and is defined in (3). When , the density function of becomes
The plots of the density function of for various parameter values are shown in Figure 6.
Figure 6.
The density function of for various parameter choices.
Remark 4.
The following results are readily obtained:
- 1.
- 2.
- 3.
- 4.
- (Thus, is not identifiable.)
- 5.
- If , then
- 6.
- If , then , where , , and .
Remark 5.
If , then , where , , and . Thus, the integral form of the cumulative distribution function (cdf) of the distribution is as follows:
where
Amiri et al. [13] obtained efficient recursive computational algorithms for multivariate t and multivariate unified skew-t distributions. Also, Salehi et al. [12] obtained recurrence relations for the cdf and the density function of the generalized skew two-piece skew-t distribution. Here, we intend to achieve to a recurrence relation for the cdf of the distribution from the integration form given by (20).
Theorem 2.
The following recurrence relation holds for all :
where , stands for the cdf of the trivariate Student’s t distribution with ν degrees of freedom and the correlation matrix
Proof.
From (20) and upon integrating by parts, the cdf of distribution with degrees of freedom is readily obtained as
Now, the second part of the right-hand side (RHS) of (22) is simplified to
□
Remark 6.
From Theorem 2, the following results are respectively concluded for odd and even values of ν
and
There is no explicit form for to be used as the starting point in (24). But an explicit form for is obtained as
Also an explicit form for is as
Thus, a closed form for the cdf of the distribution is accessible.
Moments
According to Remark 5, the moment of can be derived as follows:
where
Thus, the first four moments of can be obtained using the first four moments of in Equations (9)–(12). Consequently, the skewness and kurtosis of can be derived from Equations (13) and (14), respectively. The plots of skewness and kurtosis of for various parameter values are shown in Figure 7 and Figure 8, respectively.
Figure 7.
The skewness of for various parameter choices.
Figure 8.
The kurtosis of for various parameter choices.
As observed in Figure 7, along with the numerical optimization results, the skewness of increases with increasing and while decreasing with increasing . The maximum skewness occurs at , with a value of . From Figure 8, the kurtosis increases with increasing and while decreasing with . The maximum kurtosis value is for . Thus, the ranges of skewness and kurtosis of are wider than those of .
4. Numerical Illustration
For practical works, the distributions proposed so far in (5) and (18) must be supplied with a location (denoted by ) and a scale (denoted by ) parameters yielding and distributions, respectively. If we assume that the observations follow from the former distribution under independence conditions, then the log-likelihood function of is
Similarly, for the distribution, we have
Maximization of the log-likelihoods given by (26) and (27) which must be performed by numerical techniques lead to the maximum likelihood estimates (MLEs) of the parameters. Using the R programming environment [14], we employ a combination of the global optimizer [15] and the local optimizer (with the ’L-BFGS-B’ method), available in the and R packages, respectively. package is based on the Differential Evolution (DE) algorithm [16], and its significant performance as a global optimization algorithm on continuous numerical minimization problems has been extensively studied [17].
4.1. Simulation Study
In this section, we intend to carry out a brief simulation study in order to investigate the behavior of the MLEs of the parameters of distribution. To this end, we set some selected values as the true parameters, , and consider samples with different sizes, , as the given observations. To generate samples from distribution we employ the acceptance–rejection algorithm using the stochastic representation given by Remark 2, part 6.
As the evaluation metrics measured for the estimators, the mean squared error (MSE) and bias are computed, and the results are summarized in Table 1. Moreover, Figure 9 shows the MSE of the parameters and the absolute value of bias for different values of n.
Table 1.
MLEs and the corresponding biases and MSEs.
Figure 9.
The MSE and absolute bias of the MLEs of the SSGN’s parameters for , and different values of n.
As it is observed from Figure 9, all of the MLEs are consistent but with different convergence rates. More specifically, the performance of the MLE of for the small and medium sample sizes is not as good as those of other estimators. Therefore, we recommend using the distribution (6) instead of its complementary version in (5) when there is no significant difference in the Akaike information criteria (AICs) of these models for the given real data.
4.2. Real Data Analysis
To demonstrate the practical application of the distributions proposed so far, we examine a real dataset that includes the strength of carbon fibers [18] (see Table 2). Here, we also consider and distributions as the potential competitors of the distributions proposed so far. For fitting these distributions, we respectively employ the functions and , available in the R package [19,20].
Table 2.
The strength of carbon fibers [18].
The MLEs of parameters, the corresponding standard error, log-likelihood, Akaike information criterion (AIC), Bayesian Information Criterion (BIC) and the p-value of the Kolmogorov–Smirnov (KS) test are reported in Table 3. According to the p-value of the KS test, the goodness-of-fits of all distributions are confirmed. However, as seen in Table 3, has the minimum AIC and BIC and thus provides the best fit for the data. The corresponding Q-Q plot of the model, along with the histogram of the data including the fitted curves, is shown in Figure 10.
Table 3.
MLEs (standard errors), log-likelihood, AIC, BIC and p-value of KS test.
Figure 10.
The histogram of the data and the fitted curves (left) and the Q-Q plot of (right).
The results also indicate that the distribution provides a good fit for the carbon fiber strength data as evidenced by its AIC value and the p-value from the KS test.
5. Conclusions
In this paper, we introduced the skew-symmetric generalized normal distribution () and the skew-symmetric generalized t distribution (), extending the framework established by previous studies on skew-normal and skew-t distributions. We derived the density functions, moments, and important statistical properties of these distributions, demonstrating their flexibility in modeling asymmetric data. Moreover, a recurrence relation as well as an exact form for the cdf of the distribution were obtained. A brief simulation study was also conducted to investigate the behavior of the MLEs of the parameters. Then, a numerical illustration provided evidence of the practical applicability of the and distributions by fitting them to a real dataset concerning the strength of carbon fibers. The results indicated that the distribution outperformed its competitors, such as the skew-normal and skew-t distributions, in terms of the AIC and the KS test.
Author Contributions
Methodology, N.N.R., M.S., Y.M. and D.-G.C.; Writing—original draft, N.N.R., M.S. and Y.M.; Writing—review & editing, D.-G.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Data are contained within the article.
Acknowledgments
This work was based upon research supported in part by the RDP grant at the University of Pretoria, the National Research Foundation (NRF) of South Africa, Ref.: RA210106581084, grant No. 150170 and the South African DST-NRF-MRC SARChI Research Chair in Biostatistics (Grant No. 114613). The opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the NRF.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Jamalizadeh, A.; Behboodian, J.; Balakrishnan, N. A two-parameter generalized skew-normal distribution. Stat. Probab. Lett. 2008, 78, 1722–1726. [Google Scholar] [CrossRef]
- Jamalizadeh, A.; Balakrishnan, N. Order statistics from trivariate normal and tν-distributions in terms of generalized skew-normal and skew-tν distributions. J. Stat. Plan. Inference 2009, 139, 3799–3819. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Azzalini, A. On the Unification of Families of Skew-normal Distributions. Scand. J. Stat. 2006, 33, 561–574. [Google Scholar] [CrossRef]
- Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew-t distribution. J. R. Stat. Soc. Ser. B 2003, 65, 367–389. [Google Scholar] [CrossRef]
- Azzalini, A.; Regoli, G. Some properties of skew-symmetric distributions. Ann. Inst. Stat. Math. 2012, 64, 857–879. [Google Scholar] [CrossRef]
- Nadarajah, S.; Kotz, S. Skewed distributions generated by the normal kernel. Stat. Probab. Lett. 2003, 65, 269–277. [Google Scholar] [CrossRef]
- Gupta, A.K.; Chang, F.C. Multivariate skew-symmetric distributions. Appl. Math. Lett. 2003, 16, 643–646. [Google Scholar] [CrossRef]
- Gomez, H.W.; Venegas, O.; Bolfarine, H. Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 2007, 18, 395–407. [Google Scholar] [CrossRef]
- Nekoukhou, V.; Alamatsaz, M.H. A family of skew-symmetric-Laplace distributions. Stat. Pap. 2012, 53, 685–696. [Google Scholar] [CrossRef]
- Salehi, M.; Azzalini, A. On application of the univariate Kotz distribution and some of its extensions. METRON 2018, 76, 177–201. [Google Scholar] [CrossRef]
- Salehi, M.; Jamalizadeh, A.; Doostparast, M. A generalized skew two-piece skew-elliptical distribution. Stat. Pap. 2014, 55, 409–429. [Google Scholar] [CrossRef]
- Amiri, M.; Mehrali, Y.; Balakrishnan, N.; Jamalizadeh, A. Efficient recursive computational algorithms for multivariate t and multivariate unified skew-t distributions with applications to inference. Comput. Stat. 2022, 37, 125–158. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: https://www.R-project.org/ (accessed on 30 September 2024).
- Ardia, D.; Mullen, K.; Peterson, B.; Ulrich, J.; Boudt, K. DEoptim: Global Optimization by Differential Evolution; R 259 Package Version 2.2-5; R Core Team: Vienna, Austria, 2020. [Google Scholar]
- Storn, R.; Price, K. Differential Evolution—A simple and efficient heuristic for global optimization over continuous 292 spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
- Price, K.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Badar, M.G.; Priest, A.M. Statistical aspects of fiber and bundle strength in hybrid composites. In Progress in Science and Engineering Composites; Hayashi, T., Kawata, K., Umekawa, S., Eds.; ICCM-IV: Tokyo, Japan, 1982; pp. 1129–1136. [Google Scholar]
- Azzalini, A. The R Package ‘sn’: The Skew-Normal and Related Distributions such as the Skew-t and the SUN (Version 2.1.1). 2023. Available online: http://azzalini.stat.unipd.it/SN/ (accessed on 30 September 2024).
- Azzalini, A.; Salehi, M. Some computational aspects of maximum likelihood estimation of the Skew- distribution. In Computational and Methodological Statistics and Biostatistics; Bekker, A., Chen, G., Ferreira, J., Eds.; Emerging Topics in Statistics and Biostatistics; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).