1. Introduction
Over the last few years, the search for flexible probabilistic families capable of modeling different levels of bias and kurtosis has been an issue of great interest in the field of distributions theory. This was mainly motivated by the seminal work of Azzalini [
1]. In that paper, the probability density function (pdf) of a skew-symmetric distribution was introduced. The expression of this density is given by
where
is a symmetric pdf about zero;
is an absolutely continuous distribution function, which is also symmetric about zero; and
is a parameter of asymmetry. For the case where
is the standard normal density (from now on, we reserve the symbol
for this function), and
is the standard normal cumulative distribution function (henceforth, denoted by
), the so-called skew-normal (
) distribution with density
is obtained. We use the notation
to denote the random variable
Z with pdf given by Equation (
2). A generalization of the
distribution is introduced by Arellano-Valle et al. [
2] and Arellano-Valle et al. [
3]; they study Fisher’s information matrix of this generalization. For further details about the
distribution, the reader is referred to Azzalini [
4]. Martínez-Flórez et al. [
5] used generalizations of the
distribution to extend the Birnbaum-Saunders model, and Contreras-Reyes and Arellano-Valle [
6] utilized the Kullback–Leibler divergence measure to compare the multivariate normal distribution with the skew-multivariate normal.
One of the main limitations of working with the family given by Equation (
1) is that the information matrix could be singular for some of its particular models (see Azzalini [
1]). This might lead to some difficulties in the estimation, due to the asymptotic convergence of the maximum likelihood (ML) estimators. To overcome this issue, some authors (see Chiogna [
7] or Arellano-Valle and Azzalini [
8]) have used a reparametrization of the
model to obtain a nonsingular information matrix. However, this methodology cannot be extended to all type of skew-symmetric models which suffers of this convergence problem. On the other hand, the family of power-symmetric (
) distributions does not have this problem of singularity in the information matrix (see, Pewsey et al. [
9]). The pdf of this family of distribution is given by
where
is itself a cumulative distribution function (cdf) and
is the shape parameter. For the particular case that
, the power-normal (
) distribution is obtained, with density given by
For some references where this family is discussed, the reader is referred to Lehmann [
10], Durrans [
11], Gupta and Gupta [
12], and Pewsey et al. [
9], among other papers. Other extensions of this model are given in Martínez-Flórez et al. [
13], where a multivariate version from the model is introduced; also, Martínez-Flórez et al. [
14] carried out applications by using regression models; finally, Martínez-Flórez et al. [
15] examined the exponential transformation of the model, and Martínez-Flórez et al. [
16] examined a version of the model doubly censored with inflation in a regression context. Truncations of the
distribution were considered by Castillo el al. [
17].
In this paper, a modification in the pdf of the probabilistic family is implemented to increase the degree of kurtosis. This methodology is later used to explain datasets that include atypical observations. Usually, this methodology is accomplished by increasing the number of parameters in the model.
The paper is organized as follows. In
Section 2, first, we introduce the modified power symmetric distribution. Then, the particular case of the modified power normal distribution is derived. Some of the most relevant statistical properties of this model, including moments and kurtosis coefficient, are presented. Next, in
Section 3, some methods of estimation are discussed. Later, a simulation study is provided to illustrate the behavior of the shape parameter. A numerical application where the modified power normal distribution is compared to the
and
distributions is given in
Section 4. Finally,
Section 5 concludes the paper.
4. Application
In this section, a numerical illustration based on a real dataset is presented. The goal of this application is to show empirical evidence that the
yields a better fit to data than the
,
, and t-student
with
degrees of freedom distributions. For that reason, we consider a set of 3848 observations of the variable “density” included in the dataset verb “POLLEN5.DA” available at
http://lib.stat.cmu.edu/datasets/pollen.data. This variable measures a geometric characteristic of a specific type of pollen. This dataset was previously used by Pewsey et al. [
9] to compare the
and
distributions. A summary of some descriptive statistics are displayed in
Table 4 below.
By using the results derived in Proposition 4, we have computed the moment estimates for the parameters
of the
distribution, obtaining
. Then, by taking these numbers as initial values, the ML estimates are derived. In
Table 5, the ML estimates for the parameters of the
,
,
, and
distributions. The figures between brackets are the asymptotic standard errors of the estimates obtained by inverting the Fisher’s information matrices for the three models evaluated at their respective ML estimates. Additionally, for each model, the values of the maximum of the log-likelihood function (
) are reported. The
distribution attains the largest value, and consequently provides a better fit to data.
To compare the fit achieved by each distribution, the values of several measures of model selection, i.e., Akaike’s information criterion (AIC) (see Akaike [
22]) and Bayesian information criterion (BIC) (see Schwarz [
23]) are reported in
Table 6. A model with lower numbers in these measures of model selection is preferable. It can be seen that the
is preferable in terms of these two measures of model validation. In addition, the Kolmogorov–Smirnov test statistics and the corresponding
p-values has been included in this table for all the models considered. It can be observed that none of the models is rejected at the usual significance levels. However, the
distribution has a higher
p-value and is rejected later than the other two models. Alternative methods of model selection to the Kolmogorov–Smirnov test that can be applied here can be found in J
ntschi and Bolboac
[
24] and J
ntschi [
25]. Furthermore, the histogram associated to the empirical distribution of the variable “density” in the pollen dataset is illustrated in the left hand side of
Figure 5. In addition, the densities of
,
,
, and
, by using the maximum likelihood estimates of their parameters, have been superimposed. Similarly, on the right hand side of
Figure 5, the fit in both tails is shown. It is observable that, for this dataset, the
has thicker tails than the other three distributions. Finally, the QQ-plots for each distribution considered have been illustrated in
Figure 6. Here, note that the
distribution exhibits an almost perfect alignment with the 45
line, and therefore it provides a better fit for extreme quantiles. Finally,
Figure 7 displays the profile log-likelihood of
,
, and
of the MPN distribution. It is noticeable that the estimates are unique.