In this paper, a general class of modified power-symmetric distributions is introduced. By choosing as symmetric model the normal distribution, the modified power-normal distribution is obtained. For the latter model, some of its more relevant statistical properties are examined. Parameters estimation is carried out by using the method of moments and maximum likelihood estimation. A simulation analysis is accomplished to study the performance of the maximum likelihood estimators. Finally, we compare the efficiency of the modified power-normal distribution with other existing distributions in the literature by using a real dataset.
maximum likelihood; kurtosis; power-normal distribution
Over the last few years, the search for flexible probabilistic families capable of modeling different levels of bias and kurtosis has been an issue of great interest in the field of distributions theory. This was mainly motivated by the seminal work of Azzalini . In that paper, the probability density function (pdf) of a skew-symmetric distribution was introduced. The expression of this density is given by
where is a symmetric pdf about zero; is an absolutely continuous distribution function, which is also symmetric about zero; and is a parameter of asymmetry. For the case where is the standard normal density (from now on, we reserve the symbol for this function), and is the standard normal cumulative distribution function (henceforth, denoted by ), the so-called skew-normal () distribution with density
is obtained. We use the notation to denote the random variable Z with pdf given by Equation (2). A generalization of the distribution is introduced by Arellano-Valle et al.  and Arellano-Valle et al. ; they study Fisher’s information matrix of this generalization. For further details about the distribution, the reader is referred to Azzalini . Martínez-Flórez et al.  used generalizations of the distribution to extend the Birnbaum-Saunders model, and Contreras-Reyes and Arellano-Valle  utilized the Kullback–Leibler divergence measure to compare the multivariate normal distribution with the skew-multivariate normal.
One of the main limitations of working with the family given by Equation (1) is that the information matrix could be singular for some of its particular models (see Azzalini ). This might lead to some difficulties in the estimation, due to the asymptotic convergence of the maximum likelihood (ML) estimators. To overcome this issue, some authors (see Chiogna  or Arellano-Valle and Azzalini ) have used a reparametrization of the model to obtain a nonsingular information matrix. However, this methodology cannot be extended to all type of skew-symmetric models which suffers of this convergence problem. On the other hand, the family of power-symmetric () distributions does not have this problem of singularity in the information matrix (see, Pewsey et al. ). The pdf of this family of distribution is given by
where is itself a cumulative distribution function (cdf) and is the shape parameter. For the particular case that , the power-normal () distribution is obtained, with density given by
For some references where this family is discussed, the reader is referred to Lehmann , Durrans , Gupta and Gupta , and Pewsey et al. , among other papers. Other extensions of this model are given in Martínez-Flórez et al. , where a multivariate version from the model is introduced; also, Martínez-Flórez et al.  carried out applications by using regression models; finally, Martínez-Flórez et al.  examined the exponential transformation of the model, and Martínez-Flórez et al.  examined a version of the model doubly censored with inflation in a regression context. Truncations of the distribution were considered by Castillo el al. .
In this paper, a modification in the pdf of the probabilistic family is implemented to increase the degree of kurtosis. This methodology is later used to explain datasets that include atypical observations. Usually, this methodology is accomplished by increasing the number of parameters in the model.
The paper is organized as follows. In Section 2, first, we introduce the modified power symmetric distribution. Then, the particular case of the modified power normal distribution is derived. Some of the most relevant statistical properties of this model, including moments and kurtosis coefficient, are presented. Next, in Section 3, some methods of estimation are discussed. Later, a simulation study is provided to illustrate the behavior of the shape parameter. A numerical application where the modified power normal distribution is compared to the and distributions is given in Section 4. Finally, Section 5 concludes the paper.
2. Genesis and Properties of Modified Power-Normal Distribution
In this section, we introduce a new family of probability distributions. The idea is to make a transformation to a given probability density, as the skew-symmetric or power-symmetric distributions does. As there exists a certain resemblance between our formula (Equation (6)) and the formula for the power-symmetric distributions (Equation (3)), we agree to name these new distributions as modified power-symmetric distributions. From the standard normal distribution, we obtain the so-called Modified Power-Normal distribution. The main parameters and properties of this particular distribution will be studied throughout this work.
2.1. Probability Density Function
Let Z be a continuous and symmetric random variable with cdfand pdf, wheredenotes a vector of parameters. We say that, a random variable, X, follows adistribution, denoted as, if its cdf is given by
and its pdf is given by
In the case, the transformation given by Equation (6) is the identity. That is, thedistribution foralways provides the input probability density function.
Thereforeforth, we proceed to examine the distribution, whose cdf is provided by
and whose pdf is given by
where , is the location parameter, is the scale parameter, and is the shape parameter. Hereafter, this will be denoted as . Figure 1 depicts some different shapes of the pdf of this model, for selected values of the parameter with and . The class of distributions is applicable for the change point problem, due to its favorable properties (see Maciak et al. ); moreover, the model can be utilized in calibration (see Peta ).
Here,andare location and scale parameters of thedistribution, respectively. For the particular case, these are not only location and scale parameters but also the mean and standard deviation of the standard normal distribution.
2.2. Statistical Properties
2.2.1. Shape of the Density
The distribution exhibits a bell-shaped form, which can be symmetric or positively or negatively skewed depending on the value of the parameter . Now, we derive some analytical expressions that are useful to obtain approximations of modal values and inflection points of this model. In the following, it will be assumed that and .
The pdf ofhas a local maximum atand two inflection points atand, respectively, whereis the root of the equation
andandare two solutions of the equation
The proof consists of simple derivatives of the function f. From the Equation (8), we calculate
By setting Equations (9) and (10) to be equal to zero, the results are obtained after some algebra. Figure 2 displays the graph of the first derivative of , where it is observed that the maximum exists and it is unique. Therefore, the distribution is unimodal. □
The solutions of Equations (9) and (10) can be numerically obtained by using the built-in function “uniroot” in the software package R.Table 1below illustrates some approximations of the roots , , and, and the corresponding figures of the pdf evaluated at these values.
The rth moments offorare given by
whereis defined as
Here,is the quantile function of the standard normal distribution.
By using the change of variable , it follows that
The mean and variance of X are given by
The skewness () and kurtosis () coefficients are, respectively, given by
Observe that the integral in Equation (12) can be numerically approximated by using the built-in function integrate available in the software package R. Below, inTable 2, some approximations of the mean and variance for thedistribution for different values of α are displayed.Figure 3illustrates the behavior of theandof thedistribution for different values of α. It is observable that when α grows, the mean increases and the variance decreases.
Figure 4displays the curves associated with the coefficients of skewness (left panel) and kurtosis (right) of theanddistributions. It is shown that, depending on the values of α, thedistribution exhibits equal, greater, or lesser values for these coefficients compared to themodel. In general, thedistribution has a smaller range of skewness than thedistribution. On the other hand, when, thedistribution has a greater kurtosis coefficient than themodel.
2.2.3. Stochastic Ordering
Stochastic ordering is an important tool to compare continuous random variables. It is well-known that random variable is smaller than random variable in stochastic ordering () if for all x, and in likelihood ratio order () if decreases with x.
Using Theorem 1.C.1 and Theorem 2.A.1 of Shaked and Shanthikumar , the above stochastic orders hold according to the following implications,
The proposition shows that the members of the family can be stochastically ordered according to parameters values.
Letand. If, thenand, therefore,.
From the quotient of both densities, it follows that
is non-decreasing if and only if for , where
After some calculations, it is shown that
It is straightforward that for , then for . Therefore, is decreasing in x, and consequently . The other implication follows immediately from (13). □
In this section, parameters estimation for the distribution is discussed by using the method of moments and ML estimation. Additionally, a simulation analysis is carried out to illustrate the behavior of the ML estimators.
3.1. Method of Moments
The following proposition illustrates the derivation of the moment estimates of the distribution.
Letbe a random sample obtained from the random variable, then the moment estimatesforare given by
where, anddenote the sample mean, sample standard deviation and sample Fisher’s skewness coefficient respectively.
As and are location and scale parameters respectively, the skewness coefficient does not depend on these parameters. Thus, the result in (15) is directly obtained from matching the sample skewness coefficient with population counterpart given in Corollary 2. In addition, by considering that , where , and again by equating sample mean and sample variance to the mean and variance respectively, it follows that
where satisfies expression (15). Then, (14) is obtained by solving the latter equations for and , respectively. □
3.2. Maximum Likelihood Estimation
For a random sample derived from the distribution, the log-likelihood function can be written as
The score equations are given by
where Solutions for these Equations (17)–(19) can be obtained by using numerical procedures such as Newton–Raphson algorithm. Alternatively, these estimates can be found by directly maximizing the log-likelihood surface given by (16) and using the subroutine “optim” in the software package .
3.3. Simulation Study
To examine the behavior of the proposed approach, a simulation study is carried out to assess the performance of the estimation procedure for the parameters , , and in the model. The simulation analysis is conducted by considering 1000 generated samples of sizes 100, and 200 from the distribution. The goal of this simulation is to study the behavior of the ML estimators of the parameters by using our proposed procedure. To generate , the following algorithm is used,
Step 1: Generate
Step 2: Compute
where , , and is the quantile function of the standard normal distribution. For each generated sample of the distribution, the ML estimates and corresponding standard deviation (SD) were computed for each parameter. As it can be seen in Table 3, the performance of the estimates improves when n and increases.
Fisher’s Information Matrix
Let us now consider and . For a single observation x of X, the log-likelihood function for is given by
The corresponding first and second partial derivatives of the log-likelihood function are derived in the Appendix A. It can be shown that the Fisher’s information matrix for the distribution is provided by
with the following entries,
where must be numerically computed.
The Fisher’s (expected) information matrix can be obtained by computing the expected values of the above expressions. By taking in this matrix, , we have that and
where must be numerically obtained.
The determinant of is , consequently, the Fisher’s information matrix is nonsingular at
Therefore, for large samples, the ML estimators, , of are asymptotically normal, that is,
resulting in the asymptotic variance of the ML estimators being the inverse of Fisher’s information matrix As the parameters are unknown, the observed information matrix is usually considered, where the unknown parameters are estimated by ML.
In this section, a numerical illustration based on a real dataset is presented. The goal of this application is to show empirical evidence that the yields a better fit to data than the , , and t-student with degrees of freedom distributions. For that reason, we consider a set of 3848 observations of the variable “density” included in the dataset verb “POLLEN5.DA” available at http://lib.stat.cmu.edu/datasets/pollen.data. This variable measures a geometric characteristic of a specific type of pollen. This dataset was previously used by Pewsey et al.  to compare the and distributions. A summary of some descriptive statistics are displayed in Table 4 below.
By using the results derived in Proposition 4, we have computed the moment estimates for the parameters of the distribution, obtaining . Then, by taking these numbers as initial values, the ML estimates are derived. In Table 5, the ML estimates for the parameters of the , , , and distributions. The figures between brackets are the asymptotic standard errors of the estimates obtained by inverting the Fisher’s information matrices for the three models evaluated at their respective ML estimates. Additionally, for each model, the values of the maximum of the log-likelihood function () are reported. The distribution attains the largest value, and consequently provides a better fit to data.
To compare the fit achieved by each distribution, the values of several measures of model selection, i.e., Akaike’s information criterion (AIC) (see Akaike ) and Bayesian information criterion (BIC) (see Schwarz ) are reported in Table 6. A model with lower numbers in these measures of model selection is preferable. It can be seen that the is preferable in terms of these two measures of model validation. In addition, the Kolmogorov–Smirnov test statistics and the corresponding p-values has been included in this table for all the models considered. It can be observed that none of the models is rejected at the usual significance levels. However, the distribution has a higher p-value and is rejected later than the other two models. Alternative methods of model selection to the Kolmogorov–Smirnov test that can be applied here can be found in Jntschi and Bolboac  and Jntschi . Furthermore, the histogram associated to the empirical distribution of the variable “density” in the pollen dataset is illustrated in the left hand side of Figure 5. In addition, the densities of , , , and , by using the maximum likelihood estimates of their parameters, have been superimposed. Similarly, on the right hand side of Figure 5, the fit in both tails is shown. It is observable that, for this dataset, the has thicker tails than the other three distributions. Finally, the QQ-plots for each distribution considered have been illustrated in Figure 6. Here, note that the distribution exhibits an almost perfect alignment with the 45 line, and therefore it provides a better fit for extreme quantiles. Finally, Figure 7 displays the profile log-likelihood of , , and of the MPN distribution. It is noticeable that the estimates are unique.
5. Concluding Remarks
In this paper, a modification of the continuous symmetric-power distribution has been introduced. The particular case of the normal distribution the distribution has been examined in detail. This distribution arises by modifying the distribution function of the symmetrical powers family. After carrying out this modification, a more flexible family of probability distributions is obtained, allowing for the kurtosis coefficient to take a certain range of values in the parameter space. For this model, its basic properties, different method of estimation and Fisher’s information matrix were studied. By using a real dataset, we showed that the distribution provides a better fit than other existing models in the literature such as the , , and distributions.
The authors contributed equally to this work.
This work was partially completed while Héctor W. Gómez visited the Universidad de Las Palmas de Gran Canaria, supported by MINEDUC-UA project, code ANT1855. This research was also funded by (EGD) [Ministerio de Economía y Competitividad, Spain] grant number [ECO2013–47092]; (EGD)[Ministerio de Economía, Industria y Competitividad. Agencia Estatal de Investigación] grant number [ECO2017–85577–P].
We also acknowledge the referee’s suggestions that helped us to improve this work.
Conflicts of Interest
The authors declare no conflicts of interest.
The first derivatives of are given by
The second derivatives of are
Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat.1985, 12, 171–178. [Google Scholar]
Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. A New Class of Skew-Normal Distributions. Commun. Stat. Theory Methods2004, 33, 1465–1480. [Google Scholar] [CrossRef]
Arellano-Valle, R.B.; Gómez, H.W.; Salinas, H.S. A note on the Fisher information matrix for the skew-generalized-normal model. Stat. Oper. Res. Trans.2013, 37, 19–28. [Google Scholar]
Azzalini, A. The Skew-Normal and Related Families; IMS monographs; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar]