Multivariate Skew-Power-Normal Distributions: Properties and Associated Inference

The univariate power-normal distribution is quite useful for modeling many types of real data. On the other hand, multivariate extensions of this univariate distribution are not common in the statistic literature, mainly skewed multivariate extensions that can be bimodal, for example. In this paper, based on the univariate power-normal distribution, we extend the univariate power-normal distribution to the multivariate setup. Structural properties of the new multivariate distributions are established. We consider the maximum likelihood method to estimate the unknown parameters, and the observed and expected Fisher information matrices are also derived. Monte Carlo simulation results indicate that the maximum likelihood approach is quite effective to estimate the model parameters. An empirical application of the proposed multivariate distribution to real data is provided for illustrative purposes.


Introduction
Asymmetric univariate distributions that can be used for explaining real data which are not adequately fitted by the usual normal distribution were studied in Azzalini [1], Fernández and Steel [2], Mudholkar and Hutson [3], Durrans [4], Pewsey et al. [5], and Martínez-Flórez et al. [6], among others.In particular, Azzalini [1] has considered a general structure for asymmetric distributions in the univariate setting, which is given by where λ ∈ R controls the amount of asymmetry in the distribution, f is a symmetric (around zero) probability density function (PDF), and G is an absolutely continuous cumulative distribution function (CDF).The special case f = φ and G = Φ corresponds to the well-known skew-normal (SN) distribution, where φ and Φ are the PDF and CDF of the standard normal distribution, respectively.We have that ϕ A (z; λ = 0) = f (z).The skew-normal distribution have been extensively studied in the statistic literature.To mention a few, but not limited to, the reader is referred to Henze [7], Chiogna [8], Pewsey [9], Gómez et al. [10], and Gómez et al. [11], among many others.Another important reference regarding univariate asymmetric distributions is the work of Durrans [4], who introduced the fractional order statistics distribution with PDF given by where α > 0 is a shape parameter that controls the amount of asymmetry in the distribution, F is an absolutely continuous CDF, and f = dF is the corresponding PDF.The special case F = Φ corresponds to the well-known power-normal (PN) distribution (Gupta and Gupta [12]).Note also that ϕ D (z; α = 1) = f (z).It is worth emphasizing that the fractional order statistics distribution studied by Durrans [4] is very flexible and, in addition, the corresponding expected Fisher information is not singular at α = 0. On the other hand, the expected Fisher information matrix for the SN distribution introduced by Azzalini [1] is singular at λ = 0. Therefore, the fractional order statistics distribution has this interesting advantage in relation to the SN distribution.
The univariate models previously mentioned are only adequate for fitting unimodal data.A univariate bimodal distribution was introduced in Bolfarine et al. [13], whose PDF is given by where α > 0, β ∈ R, F is an absolutely continuous CDF with PDF f = dF symmetric around zero, G is an absolutely continuous CDF which is symmetric around zero, and c α = 2 α−1 /(2 α − 1) is the normalizing constant.In particular, if F = G = Φ, then we have the univariate asymmetric bimodal power-normal (ABPN) distribution.We use the notation ABPN(β, α) to refer to this univariate asymmetric bimodal distribution.It follows that the ABPN distribution is bimodal for α > 1, and unimodal otherwise.Additionally, β = 0 leads to a symmetric (around zero) bimodal distribution, and we use the notation BPN(α) to refer to this univariate symmetric bimodal distribution.Note also that ϕ B (z; α = 1, β) = 2 f (z)G(βz) and hence the ABPN distribution reduces to the PDF in (1) when α = 1.Also, ϕ B (z; α = 1, β = 1) = f (z).
In this paper, we generalize the univariate ABPN distribution to the multivariate setting by using the approach in Arnold et al. [14].The new multivariate distributions we propose are quite flexible and therefore can be very useful in analyzing many types of multivariate real data which occurs frequently in practice.Additionally, maximum likelihood (ML) estimation is implemented, and (observed and expected) Fisher information matrices are derived.Finally, extensions to multivariate elliptical scenarios are also discussed.It is worth emphasizing that an important claim to introduce new multivariate distributions relies on the fact that the practitioners will have new multivariate models to use in multivariate settings.Additionally, the formulae related with the new multivariate models are manageable and with the use of modern computer resources and its numerical capabilities, the proposed multivariate models may prove to be an useful addition to the arsenal of applied statisticians.We hope that the new multivariate distributions introduced in this paper may serve as alternative multivariate models to some well-known multivariate models available in the statistical literature as, for example, the multivariate SN distribution [15], and the conditional multivariate SN distribution [14], among others.We also hope that the new multivariate models may work better (at least in terms of model fitting) than some multivariate distributions available in the literature in certain practical situations, although it cannot always be guaranteed.
The paper is organized as follows.Section 2 presents a short revision of existing asymmetric multivariate models, which are used to fit unimodal data.In Section 3, we consider the symmetric multivariate PN extension with possible bimodality.ML estimation is also implemented in this section.Section 4 is devoted to the study of the asymmetric multivariate PN model with extension implemented to the location-scale situation.ML estimation is also considered in this section.Real data application is presented in Section 5.The real data illustration shows that the new multivariate distribution can be better in terms of model fitting than unimodal alternative multivariate models for analyzing multivariate data.Multivariate extension of the univariate PN model in the elliptical context is considered in Section 6.In Section 7 some concluding remarks are provided.

Multivariate Skew Models
In the last two decades, statistical literature has given great emphasis on multivariate extensions of the SN model.Important extensions are the multivariate SN distributions (Azzalini and Dalla Valle, [15]), the conditionally specified multivariate SN distribution (Arnold et al. [14]), and the multivariate alpha-power model (Martínez-Flórez et al. [6]), among others.The multivariate SN distribution has been studied by several authors, including Azzalini and Capitanio [16], Gupta and Huang [17], Gupta et al. [18], and Genton [19].Extensions of this model have been the subject of study in Arrellano-Valle and Azzalini [20,21], Arellano-Valle and Genton [22], Gupta and Chen [23], and Gupta and Chang [24].The d-dimensional SN PDF can be expressed as where φ d (x − µ; Ω) is the joint PDF of a multivariate normal distribution, Ω is a d × d positive definite variance-covariance matrix, µ ∈ R d is the location parameter, λ ∈ R d is a parameter vector which controls skewness, and ω = diag{ω 1 , . . ., ω d } is a diagonal matrix composed by the standard deviations from the variance-covariance matrix Ω.We use the notation SN d (µ, Ω, λ) to refer to this distribution; see Azzalini and Capitanio [25] for more details about multivariate SN models.Another useful multivariate model based on conditional SN distributions was studied in Arnold et al. [14], whose joint PDF takes the form where µ j ∈ R and ω j > 0 (for j = 1, . . ., d) are location and scale parameters, respectively, and λ is a shape parameter.We use the notation SNC d (µ, ω, λ) to refer to this distribution.Similarly to the univariate setting, the expected Fisher information matrix for the multivariate SN distribution is singular at λ = 0 d , where 0 k denotes a k-vector of zeros.On the other hand, expected Fisher information matrix for the conditional multivariate SN distribution is not singular at λ = 0.

The Symmetric Multivariate PN Distribution
The multivariate SN d (µ, Ω, λ) and SNC d (µ, ω, λ) models presented in Section 2 are alternative models to the multivariate normal distribution for fitting multivariate asymmetric data.However, these multivariate distributions are only adequate for fitting unimodal data.Therefore, it is interesting to develop new multivariate models (and simple as well) that are able to adequately fit data with possible bimodality.

The New Model
Initially, we define the symmetric multivariate PN (MBPN) distribution, whose joint PDF is given by where φ d (x) := φ d (x; I d ) is the joint PDF of a d-dimensional multivariate normal distribution with standardized marginals, I d is the identity matrix of order d, α = (α 1 , . . ., α d ) is the d-vector of shape parameters, and c α j = 2 α j −1 /(2 α j − 1) is a normalizing constant.We use the notation MBPN d (α) to refer to this new multivariate distribution.We have the following proposition.Proposition 1.Let X = (X 1 , . . ., X d ) ∼ MBPN d (α).We have that 1.

2.
The joint PDF of X is symmetric.

3.
The product moment of X is ), if all r j are even, where E(X r j j+ ) is the moment of the positive part of X j ∼ BPN(α j ).
Let d = 2 and hence we have the bivariate BPN model.Thus, differentiating f (x; α) with respect to x j for j = 1, 2 and equating to zero, we obtain If 0 < α j < 1 in (2), then there is no solution.Moreover, for α j = 1 the bivariate normal distribution has solution x j = 0, and for α j > 1 we obtain two solutions: x j = −(α j − 1)φ(x j )/Φ(−x j ) for x j < 0, and x j = (α j − 1)φ(x j )/Φ(x j ) for x j ≥ 0. For these solutions we have the determinant where f := f (x; α).Evaluating D at the critical points it follows that and so we conclude that the joint bivariate PDF is bimodal if α 1 > 1 or α 2 > 1.

Inference
We consider the ML procedure to estimate the parameter vector α ∈ R d of the MBPN d (α) distribution.Let x 1 = (x 11 , . . ., x 1d ) , . .., x n = (x n1 , . . ., x nd ) be an observed sample of size n from X = (X 1 , . . ., X d ) ∼ MBPN d (α).The log-likelihood function takes the form The ML estimate α = ( α 1 , . . ., α d ) of α = (α 1 , . . ., α d ) is obtained by maximizing the log-likelihood function (α) with respect to α.The maximization can be performed, for example, in the R software (R Core Team [26]) by using the optim(...) function.The partial derivatives of the log-likelihood function with respect to the model parameters become The ML estimate α = ( α 1 , . . ., α d ) can also be obtained by solving simultaneously the nonlinear system of equations ∂ (α)/∂α j = 0 for j = 1, . . ., d, which has no closed-form and, hence, the ML estimates need to be obtained through a numerical maximization of the log-likelihood function using nonlinear optimization algorithms.
Since the new MBPN distribution corresponds to a regular ML problem, we have that the standard asymptotics apply; that is, the ML estimators of the model parameters are asymptotically normal, asymptotically unbiased and have asymptotic variance-covariance matrix given by the inverse of the expected Fisher information matrix.Let Σ(α) be the expected Fisher information matrix.So, when n is large and under some mild regularity conditions (Cox and Hinkley [27]), we have that , where a ∼ means approximately distributed.It can be shown that Therefore, we immediately observe that the parameters α j 's are globally orthogonal (Cox and Reid [28]).The above asymptotic normal distribution can be used to construct approximate confidence intervals (CI) for the model parameters.Let 0 < ϑ < 1/2 be the significance level.The asymptotic CI of α j given by α Here, se(•) is the square root of the diagonal element of Σ(α) −1 corresponding to each parameter (i.e., the asymptotic standard error), and Φ −1 (•) denotes the standard normal quantile function.

A Short Simulation Study
We conduct Monte Carlo simulation experiments in order to explore the performance of the ML method in estimating the MBPN d (α) model parameters in finite-samples in the bivariate case (i.e., d = 2).Parameter values to generate the data are α 1 = 0.5, 1.5 and 2.5, and α 2 = 0.25, 2.5 and 4.75.Sample sizes considered were n = 50, 250, 500 and 1500.In this study, 10,000 random samples were generated for each sample size.To generate random variates from (X 1 , X 2 ) ∼ MBPN 2 (α) distribution, we generate two independent uniform random variables, say U 1 ∼ U(0, 1) and U 2 ∼ U(0, 1), such that The ML estimates are evaluated by considering the following quantities: the empirical mean, and squared root of the mean squared error ( √ MSE), which are computed from 10,000 Monte Carlo replications.All simulations were performed using the R language with the optimization of the log-likelihood function obtained by using the optim(...) function.From Tables 1 and 2, it is evident that the performance of the ML estimators of the MBPN model parameters is good; that is, they are very close to the true parameter values in all cases, and the √ MSE decreases as the sample size increases, as expected, since the ML estimators are consistent.In short, the numerical results provide a clear indication that the ML method can be used quite effectively to estimate the MBPN model parameters.

The New Model
Although the MBPN model defined in Section 3 can present multimodality, it corresponds to a symmetric multivariate distribution.An immediate extension for fitting asymmetric (possibly multimodal) multivariate data is given by the joint PDF  1.Note that the joint PDF can take different forms and will therefore be useful in analyzing bivariate data.Additionally, note that it can be unimodal or bimodal.
According to an anonymous referee, the MABPN distribution has a straightforward utilization within the errors-in-variables models, especially for an application in calibration; see, for example, [29].We have the following proposition.The product moment of X is given by if any r j is odd, ), if all r j are even, where Y j ∼ BPN(α j ) for j = 1, . . ., d.
From Proposition 2, note that even moments of X = (X 1 , . . ., X d ) ∼ MABPN d (α, β) do not depend on the parameter vector β.It implies that the correlation coefficient between the random variables X j and X j , where j = j for j, j = 1, . . ., d, depends basically on the parameter vector β.Note that the covariance between X j and X j , say τ(j, j ), reduces to where E(X j ) and E(X j ) are computed under the marginal PDF of X j and X j , respectively.Therefore, the parameter vector β also governs the correlation.To illustrate it, we compute the correlation in the bivariate case (i.e., d = 2), where α = (α 1 , α 2 ) and β = (β 1 , β 2 ) .The parameter values we consider are α 1 = 0.25, 1.75, 3.5, 7.0 and 15, α 2 = 0.75, 2.5 and 5.0, β 1 = −1.5, 0.0 and 1.5, and β 2 = −2.5, 0.0 and 2.5.Table 3 lists the results for the correlation.Note that if β 1 or β 2 goes to zero, independently of the values for α 1 and α 2 , the correlation tends to zero, illustrating the fact that the parameter vector β = (β 1 , β 2 ) dominates the dependence between the random variables.It is interesting to note that values of β 1 and β 2 with the same sign lead to a negative correlation and in situations where their signs are opposite, the correlation is positive.

Location-Scale Extension
The location-scale version of the MABPN model has joint PDF in the form where the d × d matrix Λ and the d-vector ξ were previously defined.We shall use the notation MABPN d (ξ, η, α, β) to refer to this location-scale MABPN distribution.

Expected Information Matrix
Let Σ(θ) = E(H(θ)) be the 4d × 4d expected information matrix, and σ θ j θ j = E(k θ j θ j ) be the corresponding elements for j, j = 1, . . ., d.These elements can be computed by using numerical procedures.When n is large and under some mild regularity conditions, we have that . From this asymptotic normal distribution, approximate CIs for the model parameters are computed in the usual manner.In particular, for α j = 1 and λ j = 0 (j = 1, . . ., d), the expected Fisher information matrix can be expressed as ) with Z j ∼ N(0, 1) and r = 0, 1, and sgn(•) denotes the signum function.After some algebraic manipulations, it can be shown that Therefore, the expected Fisher information matrix of the MABPN distribution is nonsingular at the vicinity of symmetry.This is not the case, however, with the multivariate SN distribution (Azzalini and Dalla-Valle [15]), whose expected Fisher information matrix is singular at the vicinity of symmetry, i.e., λ = 0 d .

Numerical Illustration
In this section, we present an application of the proposed bivariate ABPN distribution to real data for illustrative purposes.For the sake of comparison, we also consider the bivariate SN distribution (Azzalini and Dalla Valle [15]), and the conditional bivariate SN distribution (Arnold et al. [14]).All bivariate distributions were fitted using the location-scale extension.We shall consider the real data (see, for example, Fisher [30]) corresponding to measurements of the flowers of fifty plants each of the three species found growing together in the same colony, namely: Iris-setosa, Iris-versicolor, and Iris-virginica.Two flower measurements are considered: sepal length, and sepal width.We pooled together, the 50 Iris-setosa data points, the 50 Iris-versicolor data points and the 50 Iris-virginica data points, to get a total sample size of n = 150; that is, 150 sepal lengths, and 150 sepal widths.The observed value of the Shapiro-Wilk test for multivariate normality (see Villasenor and González [31]) is 0.9845 (p-value = 0.04).We also compute the multivariate skewness based on Mardia [32].The observed value of the multivariate skewness is 0.37 (p-value = 0.055), which clearly suggests the presence of skewness and hence of nonnormality.Hence, the bivariate normal distribution is not a tenable model for the data under study and an alternative model that is able to incorporate some degree of asymmetry would probably fit the data better.We now consider the bivariate SN distribution, the conditional bivariate SN distribution, and the bivariate ABPN distribution to fit these real data.In order to compare the model fitting of these competing bivariate distributions, we make use of the Akaike information criterion (AIC).For fitting the bivariate SN model, we use the R function msn.mle, leading to the following ML estimates (standard errors in parentheses): µ 1 = 4.915(0.196),µ 2 = 3.051(0.196),λ 1 = 2.554(0.972)and λ 2 = 0.220(0.279).The estimated variance-covariance matrix is Ω = 1.543 −0.037 −0.037 0.189 .
For the the bivariate SN model, we obtain AIC = 550.56.The ML estimates of the conditional bivariate SN model parameters (standard errors in parentheses) are: µ 1 = 5.867(0.065),µ 2 = 3.055(0.036),σ 1 = 0.794(0.043),σ 2 = 0.438(0.026)and λ = −0.224(0.110).For the the bivariate conditional SN model, we obtain AIC = 555.10.Also, the ML estimates of the proposed bivariate ABPN model parameters (standard errors in parentheses) are: ξ 1 = 5.917(0.051),ξ 2 = 2.372(0.056),η 1 = 0.729(0.050),η 2 = 0.644(0.041),α 1 = 2.378(0.605),α 2 = 3.718(0.845),β 1 = −0.481(0.300)and β 2 = 3.414(0.860).For the the proposed bivariate ABPN model, we have that AIC= 546.80, which indicates that the proposed bivariate ABPN model outperforms the bivariate SN distribution and the conditional bivariate SN distribution to model the bivariate data.Finally, Figure 2 displays the scatter plot of real data and contour levels of the fitted bivariate PDFs.Visual inspection reveals a satisfactory fit of the bivariate ABPN PDF to the real bivariate data.We would like to point out that there is some uncertainty as to what constitutes a "substantial" difference in AIC values in practical situations.The empirical evidence scale of [33] uses the AIC difference Υ m = AIC m − AIC min over all candidate models, where m = 1, . . ., T, and T denotes the total number of models considered.The models with values of Υ m ∈ [0, 2] have substantial support to be considered as good as the best approximating model.Two additional measures are then introduced to provide the "strengths" of each model: the evidence ratio (ER m ) and the weight (w m ) of model m.These measures are defined as , and Υ min = 0.The values of ER m can be interpreted as the greater likelihood of the best approximating model with respect to model m, whereas the values of w m can be interpreted as the probability of a given model being the best approximating model.The values of these measures are given in Table 4.
For example, we conclude from this table that the bivariate ABPN distribution is about 7 and 63 times more likely to be the best approximating model than the bivariate SN distribution and conditional SN distribution, respectively.Additionally, the chance of these models with respect to the bivariate ABPN distribution is also non-existent.The best bivariate ABPN distribution has a weight approximately 0.856; that is, there is (approximately) a 86% chance that it really is the best approximating model among the current models to describe these data.Notice that, by definition, the strength of evidence in favor of model i over the model j can be obtained simply by considering w i /w j .

Elliptical Family Extension
In the previous sections, the d-dimensional normal distribution played an important role in deriving the multivariate ABPN distribution.In this section, we extend those results by considering the elliptical family of distributions.For the case of a random variable (one-dimensional case), elliptical distributions correspond to all symmetric distributions in R. Specifically, a random variable X follows an symmetric distribution if its PDF is given by where g(u) (for u ≥ 0) is a nonnegative real function and corresponds to the kernel of the PDF, and c is a normalizing constant such that f X (x) is a PDF.We use the notation EC(ξ, η; g) according to (4).In general, ξ ∈ R is the location parameter and coincides with the mean if the first moment of the distribution exists, and η > 0 is the scale parameter.First, we shall introduce a multivariate elliptical PN family of distributions.A random vector follows a multivariate elliptical PN distribution if its joint PDF is given by where f j is defined in (4), and F j corresponds to its CDF for j = 1, . . ., d.Clearly, since f j belongs to the elliptical family, then it can be easily shown that ϕ F (−x; α) = ϕ F (x; α) so that ϕ F is a symmetric (around zero) PDF.We use the notation MES F (α) to refer to this distribution.From Lemma 1 of Gupta and Chang [24], we have the following generalization.
Proposition 3. Let Y = (Y 1 , . . ., Y d ) be a random vector with joint PDF ϕ F , and Z = (Z 1 , . . ., Z d ) be a random vector with absolutely continuous CDF G. Then is a joint PDF of a random vector X = (X 1 , . . ., X d ) for any β ∈ R d .
Proof.We will use the methodology given in Gupta and Chang [24].Given that ϕ F (x; α) ≥ 0 and G (β x) ≥ 0 for any x ∈ R d , it then follows that ϕ GF (x; α, β) ≥ 0. Let Lebesgue's dominated convergence theorem implies that Therefore, it allows us to conclude that h (α, β) is a constant and, hence, given that h (α, 0) = 1, we have that ϕ GF (x; α, β) is a joint PDF.

3.
For d = 2, regression functions are of linear type The product moment of X = (X 1 , . . ., X d ) ∼ MESS GF (α, β) are provided in the next proposition.
Proof.Let X = (X 1 , . . ., X d ) ∼ MESS GF (α, β) and t = (t 1 , . . ., t d ) ∈ R d .Also, let ψ X (t) be the characteristic function of X.We have that The previous result implies that even moments of X do not depend on β, so that where E(Y r j j+ ) is the moment of the positive part of the variable Y j as defined before.

Expected Information Matrix
Let σ ξ j ξ j , σ ξ j ξ j , . .., σ β j β j be the elements of the expected information matrix.

Concluding Remarks
The univariate power-normal distribution has been quite useful in the modeling of many types of real data.On the other hand, extensions of the univariate power-normal distribution to the multivariate setup have been little explored in the statistic literature.We have proposed new multivariate power-normal distributions, which are quite simple and can be useful in the modeling of multivariate data.The new multivariate power-normal distributions are absolutely continuous distributions.The joint probability density functions of the new multivariate power-normal distributions do not involve any complicated function and, hence, they can be computed easily.By employing the frequentist approach, the estimation of the multivariate power-normal distribution parameters is conducted by the maximum likelihood method.We also provide closed-form expressions for the observed and expected Fisher information matrices.We illustrate the methodology developed in this paper by means of an application to real data.We verify through the real data application that the proposed bivariate power-normal distribution was superior to the well-known bivariate skew-normal distribution (Azzalini and Dalla Valle [15]), as well as conditional bivariate skew-normal distribution (Arnold et al. [14]).Finally, it is worth stressing that the formulas related with the multivariate power-normal distributions are manageable (such as log-likelihood function, score function, and observed and expected Fisher information matrices, etc), and with the use of modern computer resources and its numerical capabilities, the proposed multivariate distributions may prove to be an useful addition to the arsenal of applied statisticians.Finally, we have also introduced in this paper multivariate PN distributions by considering the elliptical family of distributions, and discussed some of its properties.

Figure 2 .
Figure 2. Scatter plot of real data and contour level of the fitted PDFs: (left) bivariate SN distribution, (middle) conditional SN distribution, and (right) bivariate ABPN distribution.

Table 2 .
Squared root of the mean squared error ( √ MSE).

Table 3 .
Correlation for the bivariate ABPN distribution.

Table 4 .
Some measures for the fitted models.