Previous Article in Journal
Properties of Partially Degenerate Complex Appell Polynomials

Symmetry 2019, 11(12), 1509; https://doi.org/10.3390/sym11121509

Article
Multivariate Skew-Power-Normal Distributions: Properties and Associated Inference
by 2,*,† and
1
2
Departamento de Estatística, Universidade Federal do Rio Grande do Norte, Natal 59078970, RN, Brazil
3
Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile
*
Correspondence: [email protected]
These authors contributed equally to this work.
Received: 8 November 2019 / Accepted: 7 December 2019 / Published: 12 December 2019

Abstract

:
The univariate power-normal distribution is quite useful for modeling many types of real data. On the other hand, multivariate extensions of this univariate distribution are not common in the statistic literature, mainly skewed multivariate extensions that can be bimodal, for example. In this paper, based on the univariate power-normal distribution, we extend the univariate power-normal distribution to the multivariate setup. Structural properties of the new multivariate distributions are established. We consider the maximum likelihood method to estimate the unknown parameters, and the observed and expected Fisher information matrices are also derived. Monte Carlo simulation results indicate that the maximum likelihood approach is quite effective to estimate the model parameters. An empirical application of the proposed multivariate distribution to real data is provided for illustrative purposes.
Keywords:
distribution theory; maximum likelihood estimation; multivariate models; parametric inference; skewed distributions

1. Introduction

Asymmetric univariate distributions that can be used for explaining real data which are not adequately fitted by the usual normal distribution were studied in Azzalini [1], Fernández and Steel [2], Mudholkar and Hutson [3], Durrans [4], Pewsey et al. [5], and Martínez-Flórez et al. [6], among others. In particular, Azzalini [1] has considered a general structure for asymmetric distributions in the univariate setting, which is given by
$φ A ( z ; λ ) = 2 f ( z ) G ( λ z ) , z ∈ R ,$
where $λ ∈ R$ controls the amount of asymmetry in the distribution, f is a symmetric (around zero) probability density function (PDF), and G is an absolutely continuous cumulative distribution function (CDF). The special case $f = ϕ$ and $G = Φ$ corresponds to the well-known skew-normal (SN) distribution, where $ϕ$ and $Φ$ are the PDF and CDF of the standard normal distribution, respectively. We have that $φ A ( z ; λ = 0 ) = f ( z )$. The skew-normal distribution have been extensively studied in the statistic literature. To mention a few, but not limited to, the reader is referred to Henze [7], Chiogna [8], Pewsey [9], Gómez et al. [10], and Gómez et al. [11], among many others. Another important reference regarding univariate asymmetric distributions is the work of Durrans [4], who introduced the fractional order statistics distribution with PDF given by
$φ D ( z ; α ) = α f ( z ) { F ( z ) } α − 1 , z ∈ R ,$
where $α > 0$ is a shape parameter that controls the amount of asymmetry in the distribution, F is an absolutely continuous CDF, and $f = d F$ is the corresponding PDF. The special case $F = Φ$ corresponds to the well-known power-normal (PN) distribution (Gupta and Gupta [12]). Note also that $φ D ( z ; α = 1 ) = f ( z )$. It is worth emphasizing that the fractional order statistics distribution studied by Durrans [4] is very flexible and, in addition, the corresponding expected Fisher information is not singular at $α = 0$. On the other hand, the expected Fisher information matrix for the SN distribution introduced by Azzalini [1] is singular at $λ = 0$. Therefore, the fractional order statistics distribution has this interesting advantage in relation to the SN distribution.
The univariate models previously mentioned are only adequate for fitting unimodal data. A univariate bimodal distribution was introduced in Bolfarine et al. [13], whose PDF is given by
$φ B ( z ; α , β ) = 2 α c α f ( z ) F ( | z | ) α − 1 G ( β z ) , z ∈ R ,$
where $α > 0$, $β ∈ R$, F is an absolutely continuous CDF with PDF $f = d F$ symmetric around zero, G is an absolutely continuous CDF which is symmetric around zero, and $c α = 2 α − 1 / ( 2 α − 1 )$ is the normalizing constant. In particular, if  $F = G = Φ$, then we have the univariate asymmetric bimodal power-normal (ABPN) distribution. We use the notation ABPN$( β , α )$ to refer to this univariate asymmetric bimodal distribution. It follows that the ABPN distribution is bimodal for $α > 1$, and unimodal otherwise. Additionally, $β = 0$ leads to a symmetric (around zero) bimodal distribution, and we use the notation BPN$( α )$ to refer to this univariate symmetric bimodal distribution. Note also that $φ B ( z ; α = 1 , β ) = 2 f ( z ) G ( β z )$ and hence the ABPN distribution reduces to the PDF in (1) when $α = 1$. Also, $φ B ( z ; α = 1 , β = 1 ) = f ( z )$.
In this paper, we generalize the univariate ABPN distribution to the multivariate setting by using the approach in Arnold et al. [14]. The new multivariate distributions we propose are quite flexible and therefore can be very useful in analyzing many types of multivariate real data which occurs frequently in practice. Additionally, maximum likelihood (ML) estimation is implemented, and (observed and expected) Fisher information matrices are derived. Finally, extensions to multivariate elliptical scenarios are also discussed. It is worth emphasizing that an important claim to introduce new multivariate distributions relies on the fact that the practitioners will have new multivariate models to use in multivariate settings. Additionally, the formulae related with the new multivariate models are manageable and with the use of modern computer resources and its numerical capabilities, the proposed multivariate models may prove to be an useful addition to the arsenal of applied statisticians. We hope that the new multivariate distributions introduced in this paper may serve as alternative multivariate models to some well-known multivariate models available in the statistical literature as, for example, the multivariate SN distribution [15], and the conditional multivariate SN distribution [14], among others. We also hope that the new multivariate models may work better (at least in terms of model fitting) than some multivariate distributions available in the literature in certain practical situations, although it cannot always be guaranteed.
The paper is organized as follows. Section 2 presents a short revision of existing asymmetric multivariate models, which are used to fit unimodal data. In Section 3, we consider the symmetric multivariate PN extension with possible bimodality. ML estimation is also implemented in this section. Section 4 is devoted to the study of the asymmetric multivariate PN model with extension implemented to the location-scale situation. ML estimation is also considered in this section. Real data application is presented in Section 5. The real data illustration shows that the new multivariate distribution can be better in terms of model fitting than unimodal alternative multivariate models for analyzing multivariate data. Multivariate extension of the univariate PN model in the elliptical context is considered in Section 6. In Section 7 some concluding remarks are provided.

2. Multivariate Skew Models

In the last two decades, statistical literature has given great emphasis on multivariate extensions of the SN model. Important extensions are the multivariate SN distributions (Azzalini and Dalla Valle, [15]), the conditionally specified multivariate SN distribution (Arnold et al. [14]), and the multivariate alpha-power model (Martínez-Flórez et al. [6]), among others. The multivariate SN distribution has been studied by several authors, including Azzalini and Capitanio [16], Gupta and Huang [17], Gupta et al. [18], and Genton [19]. Extensions of this model have been the subject of study in Arrellano-Valle and Azzalini [20,21], Arellano-Valle and Genton [22], Gupta and Chen [23], and Gupta and Chang [24]. The d-dimensional SN PDF can be expressed as
$2 ϕ d ( x − μ ; Ω ) Φ ( λ ′ ω − 1 ( x − μ ) ) , x ∈ R d ,$
where $ϕ d ( x − μ ; Ω )$ is the joint PDF of a multivariate normal distribution, $Ω$ is a $d × d$ positive definite variance-covariance matrix, $μ ∈ R d$ is the location parameter, $λ ∈ R d$ is a parameter vector which controls skewness, and $ω = diag { ω 1 , … , ω d }$ is a diagonal matrix composed by the standard deviations from the variance-covariance matrix $Ω$. We use the notation SN$d ( μ , Ω , λ )$ to refer to this distribution; see Azzalini and Capitanio [25] for more details about multivariate SN models. Another useful multivariate model based on conditional SN distributions was studied in Arnold et al. [14], whose joint PDF takes the form
$2 ∏ j = 1 d 1 ω j ϕ x j − μ j ω j Φ λ ∏ j = 1 d x j − μ j ω j , x ∈ R d ,$
where $μ j ∈ R$ and $ω j > 0$ (for $j = 1 , … , d$) are location and scale parameters, respectively, and $λ$ is a shape parameter. We use the notation SNC$d ( μ , ω , λ )$ to refer to this distribution. Similarly to the univariate setting, the expected Fisher information matrix for the multivariate SN distribution is singular at $λ = 0 d$, where $0 k$ denotes a k-vector of zeros. On the other hand, expected Fisher information matrix for the conditional multivariate SN distribution is not singular at $λ = 0$.

3. The Symmetric Multivariate PN Distribution

The multivariate SN$d ( μ , Ω , λ )$ and SNC$d ( μ , ω , λ )$ models presented in Section 2 are alternative models to the multivariate normal distribution for fitting multivariate asymmetric data. However, these multivariate distributions are only adequate for fitting unimodal data. Therefore, it is interesting to develop new multivariate models (and simple as well) that are able to adequately fit data with possible bimodality.

3.1. The New Model

Initially, we define the symmetric multivariate PN (MBPN) distribution, whose joint PDF is given by
$f ( x ; α ) = ϕ d ( x ) ∏ j = 1 d α j c α j { Φ ( | x j | ) } α j − 1 , x ∈ R d ,$
where $ϕ d ( x ) : = ϕ d ( x ; I d )$ is the joint PDF of a d-dimensional multivariate normal distribution with standardized marginals, $I d$ is the identity matrix of order d, $α = ( α 1 , … , α d ) ′$ is the d-vector of shape parameters, and $c α j = 2 α j − 1 / ( 2 α j − 1 )$ is a normalizing constant. We use the notation MBPN$d ( α )$ to refer to this new multivariate distribution. We have the following proposition.
Proposition 1.
Let $X = ( X 1 , … , X d ) ′ ∼ M B P N d ( α )$. We have that
1.
$X j ∼ B P N ( α j )$ for $j = 1 , … , d$.
2.
The joint PDF of $X$ is symmetric.
3.
The product moment of $X$ is
where $E ( X j + r j )$ is the moment of the positive part of $X j ∼ B P N ( α j )$.
4.
The joint PDF of $X$ is multimodal if $α j > 1$ for $j = 1 , … , d$.
Let $d = 2$ and hence we have the bivariate BPN model. Thus, differentiating $f ( x ; α )$ with respect to $x j$ for $j = 1 , 2$ and equating to zero, we obtain
$α j − 1 = | x j | Φ ( | x j | ) ϕ ( x j ) , j = 1 , 2 .$
If $0 < α j < 1$ in (2), then there is no solution. Moreover, for $α j = 1$ the bivariate normal distribution has solution $x j = 0$, and for $α j > 1$ we obtain two solutions: $x j = − ( α j − 1 ) ϕ ( x j ) / Φ ( − x j )$ for $x j < 0$, and $x j = ( α j − 1 ) ϕ ( x j ) / Φ ( x j )$ for $x j ≥ 0$. For these solutions we have the determinant
$D = ∂ 2 f ∂ x 1 2 ∂ 2 f ∂ x 2 ∂ x 1 ∂ 2 f ∂ x 1 ∂ x 2 ∂ 2 f ∂ x 2 2 ,$
where $f : = f ( x ; α )$. Evaluating D at the critical points it follows that
$D > 0 and ∂ 2 f ∂ x 1 2 > 0 , for α 1 ≤ 1 and α 2 ≤ 1 , D > 0 and ∂ 2 f ∂ x 1 2 < 0 , for α 1 > 1 or α 2 > 1 ,$
and so we conclude that the joint bivariate PDF is bimodal if $α 1 > 1$ or $α 2 > 1$.

3.2. Inference

We consider the ML procedure to estimate the parameter vector $α ∈ R d$ of the MBPN$d ( α )$ distribution. Let $x 1 = ( x 11 , … , x 1 d ) ′$, , $x n = ( x n 1 , … , x n d ) ′$ be an observed sample of size n from $X = ( X 1 , … , X d ) ′ ∼ M B P N d ( α )$. The log-likelihood function takes the form
$ℓ ( α ) = n ∑ j = 1 d [ log ( α j ) + log ( c α j ) ] + ∑ i = 1 n ∑ j = 1 d log ( ϕ ( x i j ) ) + ∑ i = 1 n ∑ j = 1 d ( α j − 1 ) log ( Φ ( | x i j | ) ) .$
The ML estimate $α ^ = ( α ^ 1 , … , α ^ d ) ′$ of $α = ( α 1 , … , α d ) ′$ is obtained by maximizing the log-likelihood function $ℓ ( α )$ with respect to $α$. The maximization can be performed, for example, in the R software (R Core Team [26]) by using the optim(...) function. The partial derivatives of the log-likelihood function with respect to the model parameters become
$∂ ℓ ( α ) ∂ α j = n α j − n log 2 2 α j − 1 + ∑ i = 1 n log ( Φ ( | x i j | ) ) , j = 1 , … , d .$
The ML estimate $α ^ = ( α ^ 1 , … , α ^ d ) ′$ can also be obtained by solving simultaneously the nonlinear system of equations $∂ ℓ ( α ) / ∂ α j = 0$ for $j = 1 , … , d$, which has no closed-form and, hence, the ML estimates need to be obtained through a numerical maximization of the log-likelihood function using nonlinear optimization algorithms.
Since the new MBPN distribution corresponds to a regular ML problem, we have that the standard asymptotics apply; that is, the ML estimators of the model parameters are asymptotically normal, asymptotically unbiased and have asymptotic variance-covariance matrix given by the inverse of the expected Fisher information matrix. Let $Σ ( α )$ be the expected Fisher information matrix. So, when n is large and under some mild regularity conditions (Cox and Hinkley [27]), we have that $n ( α ^ − α ) ∼ a N d ( 0 d , Σ ( α ) − 1 )$, where $∼ a$ means approximately distributed. It can be shown that
$Σ ( α ) = diag α 1 − 2 − 2 α 1 ( 2 α 1 − 1 ) − 2 ( log 2 ) 2 , … , α d − 2 − 2 α d ( 2 α d − 1 ) − 2 ( log 2 ) 2 .$
Therefore, we immediately observe that the parameters $α j$’s are globally orthogonal (Cox and Reid [28]). The above asymptotic normal distribution can be used to construct approximate confidence intervals (CI) for the model parameters. Let $0 < ϑ < 1 / 2$ be the significance level. The asymptotic CI of $α j$ given by $α ^ j ± Φ − 1 ( 1 − ϑ / 2 ) se ( α ^ j )$ for $j = 1 , … , d$, with asymptotic coverage of $100 ( 1 − ϑ ) %$. Here, $se ( · )$ is the square root of the diagonal element of $Σ ( α ) − 1$ corresponding to each parameter (i.e., the asymptotic standard error), and $Φ − 1 ( · )$ denotes the standard normal quantile function.

3.3. A Short Simulation Study

We conduct Monte Carlo simulation experiments in order to explore the performance of the ML method in estimating the MBPN$d ( α )$ model parameters in finite-samples in the bivariate case (i.e., $d = 2$). Parameter values to generate the data are $α 1 = 0 . 5$, 1.5 and 2.5, and $α 2 = 0 . 25$, 2.5 and 4.75. Sample sizes considered were $n = 50$, 250, 500 and 1500. In this study, 10,000 random samples were generated for each sample size. To generate random variates from $( X 1 , X 2 ) ′ ∼$ MBPN$2 ( α )$ distribution, we generate two independent uniform random variables, say $U 1 ∼ U ( 0 , 1 )$ and $U 2 ∼ U ( 0 , 1 )$, such that
$( U 1 i , U 2 i ) = U 1 < 1 / 2 and U 2 < 1 / 2 , U 1 < 1 / 2 and U 2 ≥ 1 / 2 , U 1 ≥ 1 / 2 and U 2 < 1 / 2 , U 1 ≥ 1 / 2 and U 2 ≥ 1 / 2 ,$
for $i = 1 , … , n$, leading to the following relation
$( X 1 i , X 2 i ) = − Φ − 1 1 − 2 ( 1 − 2 − α 1 ) U 1 1 / α 1 , − Φ − 1 1 − 2 ( 1 − 2 − α 2 ) U 2 1 / α 2 , − Φ − 1 1 − 2 ( 1 − 2 − α 1 ) U 1 1 / α 1 , Φ − 1 2 − α 2 ( 2 ( 2 α 2 − 1 ) ( U 2 − 2 − 1 ) ) + 1 1 / α 2 , Φ − 1 2 − α 1 ( 2 ( 2 α 1 − 1 ) ( U 1 − 2 − 1 ) ) + 1 1 / α 1 , − Φ − 1 1 − 2 ( 1 − 2 − α 2 ) U 2 1 / α 2 , Φ − 1 2 − α 1 ( 2 ( 2 α 1 − 1 ) ( U 1 − 2 − 1 ) ) + 1 1 / α 1 , Φ − 1 2 − α 2 ( 2 ( 2 α 2 − 1 ) ( U 2 − 2 − 1 ) ) + 1 1 / α 2 .$
The ML estimates are evaluated by considering the following quantities: the empirical mean, and squared root of the mean squared error ($M S E$), which are computed from $10 , 000$ Monte Carlo replications. All simulations were performed using the R language with the optimization of the log-likelihood function obtained by using the optim(...) function. From Table 1 and Table 2, it is evident that the performance of the ML estimators of the MBPN model parameters is good; that is, they are very close to the true parameter values in all cases, and the $M S E$ decreases as the sample size increases, as expected, since the ML estimators are consistent. In short, the numerical results provide a clear indication that the ML method can be used quite effectively to estimate the MBPN model parameters.

3.4. Location-Scale Extension

The joint PDF of the MBPN model in the location-scale context is simply given by
$f ( x ; ξ , η , α ) = ϕ d ( x − ξ ; Λ ) ∏ j = 1 d α j c α j Φ x j − ξ j η j α j − 1 , x ∈ R d ,$
where $Λ = diag { η 1 , … , η d }$ with $η j > 0$ (for $j = 1 , ⋯ , d$) being scale parameters, and $ξ = ( ξ 1 , … , ξ d ) ′ ∈ R d$ is a d-vector of location parameters. We shall use the notation MBPN$d ( ξ , η , α )$, where $η = ( η 1 , … , η d ) ′$.

4. The Asymmetric Multivariate PN Model

4.1. The New Model

Although the MBPN model defined in Section 3 can present multimodality, it corresponds to a symmetric multivariate distribution. An immediate extension for fitting asymmetric (possibly multimodal) multivariate data is given by the joint PDF
$f ( x ; α , β ) = 2 ϕ d ( x ) ∏ j = 1 d α j c α j Φ x j α j − 1 Φ ( β ′ x ) , x ∈ R d ,$
where $β = ( β 1 , … , β d ) ′ ∈ R d$ is a parameter vector which controls skewness. The above joint PDF corresponds to the multivariate extension of the univariate ABPN model, and we shall use the notation MABPN$d ( α , β )$ to refer to this multivariate distribution. Let $d = 2$ and hence we have the bivariate ABPN model with parameter vector $( α 1 , α 2 , β 1 , β 2 )$. Some contour plots of the joint bivariate PDF are presented in Figure 1. Note that the joint PDF can take different forms and will therefore be useful in analyzing bivariate data. Additionally, note that it can be unimodal or bimodal. According to an anonymous referee, the MABPN distribution has a straightforward utilization within the errors-in-variables models, especially for an application in calibration; see, for example, [29].
We have the following proposition.
Proposition 2.
Let $X = ( X 1 , … , X d ) ′ ∼ M A B P N d ( α , β )$. We have that
1.
If $β = 0 d$, then the MABPN model reduces to the MBPN model.
2.
$− X ∼ M A B P N d ( α , − β )$.
3.
If $α = 1 d$, then the MABPN model reduces to the SN$d ( 0 d , I d , β )$ model, where $1 d = ( 1 , … , 1 ) ′$ is a d-vector of ones.
4.
The product moment of $X$ is given by
where $Y j ∼ B P N ( α j )$ for $j = 1 , … , d$.
From Proposition 2, note that even moments of $X = ( X 1 , … , X d ) ′ ∼ M A B P N d ( α , β )$ do not depend on the parameter vector $β$. It implies that the correlation coefficient between the random variables $X j$ and $X j ′$, where $j ≠ j ′$ for $j , j ′ = 1 , … , d$, depends basically on the parameter vector $β$. Note that the covariance between $X j$ and $X j ′$, say $τ ( j , j ′ )$, reduces to
$τ ( j , j ′ ) = E ( X j X j ′ ) − E ( X j ) E ( X j ′ ) = − E ( X j ) E ( X j ′ ) ,$
where $E ( X j )$ and $E ( X j ′ )$ are computed under the marginal PDF of $X j$ and $X j ′$, respectively. Therefore, the parameter vector $β$ also governs the correlation. To illustrate it, we compute the correlation in the bivariate case (i.e., $d = 2$), where $α = ( α 1 , α 2 ) ′$ and $β = ( β 1 , β 2 ) ′$. The parameter values we consider are $α 1 = 0 . 25$, 1.75, 3.5, 7.0 and 15, $α 2 = 0 . 75$, 2.5 and 5.0, $β 1 = − 1 . 5 , 0 . 0$ and 1.5, and $β 2 = − 2 . 5 , 0 . 0$ and 2.5. Table 3 lists the results for the correlation. Note that if $β 1$ or $β 2$ goes to zero, independently of the values for $α 1$ and $α 2$, the correlation tends to zero, illustrating the fact that the parameter vector $β = ( β 1 , β 2 ) ′$ dominates the dependence between the random variables. It is interesting to note that values of $β 1$ and $β 2$ with the same sign lead to a negative correlation and in situations where their signs are opposite, the correlation is positive.

4.2. Location-Scale Extension

The location-scale version of the MABPN model has joint PDF in the form
$f ( x ; ξ , η , α , β ) = 2 ϕ d ( x − ξ , Λ ) ∏ j = 1 d α j c α j Φ x j − ξ j η j α j − 1 Φ β ′ Λ − 1 x − ξ , x ∈ R d ,$
where the $d × d$ matrix $Λ$ and the d-vector $ξ$ were previously defined. We shall use the notation MABPN$d ( ξ , η , α , β )$ to refer to this location-scale MABPN distribution.

4.3. Inference

Let $θ = ( ξ ′ , η ′ , α ′ , β ′ ) ′$ be the parameter vector of interest. The log-likelihood function for $θ$, given the observed sample $x 1 = ( x 11 , … , x 1 d ) ′$, , $x n = ( x n 1 , … , x n d ) ′$ of size n from $X = ( X 1 , … , X d ) ′ ∼ M A B P N d ( θ )$, is given by
$ℓ ( θ ) = n ∑ j = 1 d [ log ( α j ) + log ( c α j ) − log ( η j ) ] + ∑ i = 1 n ∑ j = 1 d log ( ϕ ( z i j ) ) + ∑ i = 1 n ∑ j = 1 d ( α j − 1 ) log ( Φ ( | z i j | ) ) + ∑ i = 1 n log Φ ∑ j = 1 d β j z i j ,$
where $z i j = ( x i j − ξ j ) / η j$ for $i = 1 , … , n$ and $j = 1 , … , d$. The ML estimate $θ ^ = ( ξ ^ ′ , η ^ ′ , α ^ ′ , β ^ ′ ) ′$ of $θ = ( ξ ′ , η ′ , α ′ , β ′ ) ′$, where $ξ ^ = ( ξ ^ 1 , … , ξ ^ d ) ′$, $η ^ = ( η ^ 1 , … , η ^ d ) ′$, $α ^ = ( α ^ 1 , … , α ^ d ) ′$ and $β ^ = ( β ^ 1 , … , β ^ d ) ′$, is obtained by maximizing the log-likelihood function $ℓ ( θ )$ with respect to $θ$ by using, for example, the R function optim(...). The first-order partial derivatives of (3) become ($j = 1 , … , d$)
$∂ ℓ ( θ ) ∂ ξ j = n z j ¯ − ( α j − 1 ) sgn ( z j ) w j ¯ − β j w 1 j ¯ / η j ,$
$∂ ℓ ( θ ) ∂ η j = n z j 2 ¯ − 1 − ( α j − 1 ) | z j | w j ¯ − β j z j w 1 j ¯ / η j ,$
$∂ ℓ ( θ ) ∂ α j = n u j ¯ + 1 / α j − log 2 / ( 2 α j − 1 ) , ∂ ℓ ( θ ) ∂ β j = n z j w 1 j ¯ ,$
where $u i j = log ( Φ ( | z i j | ) )$, $w i j = ϕ ( | z i j | ) / Φ ( | z i j | )$, $w 1 i j = ϕ ( z 1 i j ) / Φ ( z 1 i j )$ and $z 1 i j = ∑ j = 1 d β j z i j$ for $i = 1 , … , n$ and $j = 1 , … , d$. We are using the notation $u j ¯ = n − 1 ∑ i = 1 n u i j$, $w 1 j ¯ = n − 1 ∑ i = 1 n w 1 i j$, $| z j | w j ¯ = n − 1 ∑ i = 1 n | z i j | w i j$, and so on. The ML estimates can also be obtained by solving simultaneously the nonlinear system of equations $∂ ℓ ( θ ) / ∂ ξ j = 0$, $∂ ℓ ( θ ) / ∂ η j = 0$, $∂ ℓ ( θ ) / ∂ α j = 0$ and $∂ ℓ ( θ ) / ∂ β j = 0$ for $j = 1 , … , d$, which has no closed-form and, hence, the ML estimates need to be obtained through a numerical maximization of the log-likelihood function using nonlinear optimization algorithms.

4.4. Observed Information Matrix

Let $H ( θ ) = − ∂ 2 ℓ ( θ ) / ∂ θ ∂ θ ⊤$ be the $4 d × 4 d$ observed information matrix, whose elements are denoted by $k θ j θ j ′$ for $j , j ′ = 1 , … , d$. After some algebraic manipulations, we have that
$k ξ j ξ j = n { 1 + ( α j − 1 ) ( | z j | w j ¯ + w j 2 ¯ ) + β j 2 ( z 1 j w 1 j ¯ + w 1 j 2 ¯ ) } / η j 2 ,$
$k ξ j ξ j ′ = n β j β j ′ ( z 1 j w 1 j ¯ + w 1 j 2 ¯ ) / η j η j ′ ,$
$k ξ j η j = n { 2 z j ¯ + ( α j − 1 ) ( sgn ( z j ) z j 2 w j ¯ + z j w j 2 ¯ − sgn ( z j ) w j ¯ ) − β j w 1 j ¯ + β j 2 ( z j z 1 j w 1 j ¯ + z j w 1 j 2 ¯ ) } / η j 2 ,$
$k ξ j η j ′ = n β j β j ′ ( z j ′ z 1 j w 1 j ¯ + z j ′ w 1 j 2 ¯ ) / η j η j ′ , k ξ j α j = n sgn ( z j ) w j ¯ / η j ,$
$k ξ j β j = n { w 1 j ¯ − β j ( z j z 1 j w 1 j ¯ + z j w 1 j 2 ¯ ) } / η j , k ξ j β j ′ = − n { β j ( z j ′ z 1 j w 1 j ¯ + z j ′ w 1 j 2 ¯ ) } / η j ,$
$k η j η j = n { 3 z j 2 ¯ − 1 − ( α j − 1 ) ( 2 | z j | w j ¯ − | z j | 3 w j ¯ − z j 2 w j 2 ¯ ) − 2 β j z j w 1 j ¯ + β j 2 ( z j 2 z 1 j w 1 j ¯ + z j 2 w 1 j 2 ¯ ) } / η j 2 ,$
$k η j η j ′ = n β j β j ′ ( z j z j ′ z 1 j w 1 j ¯ + z j z j ′ w 1 j 2 ¯ ) / η j η j ′ , k η j α j = n | z j | w j ¯ / η j ,$
$k η j β j = n { z j w 1 j ¯ − β j ( z j 2 z 1 j w 1 j ¯ + z j 2 w 1 j 2 ¯ ) } / η j , k η j β j ′ = − n β j ( z j z j ′ z 1 j w 1 j ¯ + z j z j ′ w 1 j 2 ¯ ) / η j ,$
$k α j α j = n { α j − 2 ( 2 α j − 1 ) 2 − 2 α j ( log 2 ) 2 } / ( 2 α j − 1 ) 2 , k β j β j = n { z j 2 z 1 j w 1 j ¯ + z j 2 w 1 j 2 ¯ } ,$
$k β j β j ′ = n { z j z j ′ z 1 j w 1 j ¯ + z j z j ′ w 1 j 2 ¯ } , k ξ j α j ′ = k η j α j ′ = k α j α j ′ = k α j β j = k α j β j ′ = 0 .$

4.5. Expected Information Matrix

Let $Σ ( θ ) = E ( H ( θ ) )$ be the $4 d × 4 d$ expected information matrix, and $σ θ j θ j ′ = E ( k θ j θ j ′ )$ be the corresponding elements for $j , j ′ = 1 , … , d$. These elements can be computed by using numerical procedures. When n is large and under some mild regularity conditions, we have that $n ( θ ^ − θ ) ∼ a N 4 d ( 0 d , Σ ( θ ) − 1 )$. From this asymptotic normal distribution, approximate CIs for the model parameters are computed in the usual manner. In particular, for $α j = 1$ and $λ j = 0$ ($j = 1 , … , d$), the expected Fisher information matrix can be expressed as
$Σ ( θ ) = Λ 1 − 1 0 d ′ δ 0 Λ − 1 ( 2 / π ) Λ − 1 0 d ′ 2 Λ 1 − 1 δ 1 Λ − 1 0 d ′ δ 0 Λ − 1 δ 1 Λ − 1 [ 1 − 2 ( log 2 ) 2 ] I d 0 d ′ 2 / π Λ − 1 0 d ′ 0 d ′ ( 2 / π ) I d ,$
where $Λ 1 = diag { η 1 2 , … , η d 2 }$, $δ r = E ( sgn ( Z j ) Z j r ϕ ( | Z j | ) / Φ ( | Z j | ) )$ with $Z j ∼ N ( 0 , 1 )$ and $r = 0 , 1$, and $sgn ( · )$ denotes the signum function. After some algebraic manipulations, it can be shown that
$Σ ( θ ) = ( − 1 ) d 4 δ 0 2 d ∏ j = 1 d η j 4 π j ≠ 0 .$
Therefore, the expected Fisher information matrix of the MABPN distribution is nonsingular at the vicinity of symmetry. This is not the case, however, with the multivariate SN distribution (Azzalini and Dalla-Valle [15]), whose expected Fisher information matrix is singular at the vicinity of symmetry, i.e., $λ = 0 d$.

5. Numerical Illustration

In this section, we present an application of the proposed bivariate ABPN distribution to real data for illustrative purposes. For the sake of comparison, we also consider the bivariate SN distribution (Azzalini and Dalla Valle [15]), and the conditional bivariate SN distribution (Arnold et al. [14]). All bivariate distributions were fitted using the location-scale extension. We shall consider the real data (see, for example, Fisher [30]) corresponding to measurements of the flowers of fifty plants each of the three species found growing together in the same colony, namely: Iris-setosa, Iris-versicolor, and Iris-virginica. Two flower measurements are considered: sepal length, and sepal width. We pooled together, the 50 Iris-setosa data points, the 50 Iris-versicolor data points and the 50 Iris-virginica data points, to get a total sample size of $n = 150$; that is, 150 sepal lengths, and 150 sepal widths. The observed value of the Shapiro-Wilk test for multivariate normality (see Villasenor and González [31]) is 0.9845 (p-value $= 0 . 04$). We also compute the multivariate skewness based on Mardia [32]. The observed value of the multivariate skewness is 0.37 (p-value $= 0 . 055$), which clearly suggests the presence of skewness and hence of nonnormality. Hence, the bivariate normal distribution is not a tenable model for the data under study and an alternative model that is able to incorporate some degree of asymmetry would probably fit the data better. We now consider the bivariate SN distribution, the conditional bivariate SN distribution, and the bivariate ABPN distribution to fit these real data. In order to compare the model fitting of these competing bivariate distributions, we make use of the Akaike information criterion (AIC). For fitting the bivariate SN model, we use the R function msn.mle, leading to the following ML estimates (standard errors in parentheses): $μ ^ 1 = 4 . 915 ( 0 . 196 )$, $μ ^ 2 = 3 . 051 ( 0 . 196 )$, $λ ^ 1 = 2 . 554 ( 0 . 972 )$ and $λ ^ 2 = 0 . 220 ( 0 . 279 )$. The estimated variance-covariance matrix is
$Ω ^ = 1.543 − 0.037 − 0.037 0.189 .$
For the the bivariate SN model, we obtain AIC $= 550 . 56$. The ML estimates of the conditional bivariate SN model parameters (standard errors in parentheses) are: $μ ^ 1 = 5 . 867 ( 0 . 065 )$, $μ ^ 2 = 3 . 055 ( 0 . 036 )$, $σ ^ 1 = 0 . 794 ( 0 . 043 )$, $σ ^ 2 = 0 . 438 ( 0 . 026 )$ and $λ ^ = − 0 . 224 ( 0 . 110 )$. For the the bivariate conditional SN model, we obtain AIC $= 555 . 10$. Also, the ML estimates of the proposed bivariate ABPN model parameters (standard errors in parentheses) are: $ξ ^ 1 = 5 . 917 ( 0 . 051 )$, $ξ ^ 2 = 2 . 372 ( 0 . 056 )$, $η ^ 1 = 0 . 729 ( 0 . 050 )$, $η ^ 2 = 0 . 644 ( 0 . 041 )$, $α ^ 1 = 2 . 378 ( 0 . 605 )$, $α ^ 2 = 3 . 718 ( 0 . 845 )$, $β ^ 1 = − 0 . 481 ( 0 . 300 )$ and $β ^ 2 = 3 . 414 ( 0 . 860 )$. For the the proposed bivariate ABPN model, we have that AIC$= 546 . 80$, which indicates that the proposed bivariate ABPN model outperforms the bivariate SN distribution and the conditional bivariate SN distribution to model the bivariate data. Finally, Figure 2 displays the scatter plot of real data and contour levels of the fitted bivariate PDFs. Visual inspection reveals a satisfactory fit of the bivariate ABPN PDF to the real bivariate data.
We would like to point out that there is some uncertainty as to what constitutes a “substantial” difference in AIC values in practical situations. The empirical evidence scale of [33] uses the AIC difference $Y m = AIC m − AIC min$ over all candidate models, where $m = 1 , … , T$, and T denotes the total number of models considered. The models with values of $Y m ∈ [ 0 , 2 ]$ have substantial support to be considered as good as the best approximating model. Two additional measures are then introduced to provide the “strengths” of each model: the evidence ratio (ER$m$) and the weight ($w m$) of model m. These measures are defined as
$ER m = exp ( − Y min / 2 ) exp ( − Y m / 2 ) = exp ( Y m / 2 ) , w m = exp ( − Y m / 2 ) ∑ m = 1 T exp ( − Y m / 2 ) ,$
and $Y min = 0$. The values of ER$m$ can be interpreted as the greater likelihood of the best approximating model with respect to model m, whereas the values of $w m$ can be interpreted as the probability of a given model being the best approximating model. The values of these measures are given in Table 4. For example, we conclude from this table that the bivariate ABPN distribution is about 7 and 63 times more likely to be the best approximating model than the bivariate SN distribution and conditional SN distribution, respectively. Additionally, the chance of these models with respect to the bivariate ABPN distribution is also non-existent. The best bivariate ABPN distribution has a weight approximately 0.856; that is, there is (approximately) a 86% chance that it really is the best approximating model among the current models to describe these data. Notice that, by definition, the strength of evidence in favor of model i over the model j can be obtained simply by considering $w i / w j$.

6. Elliptical Family Extension

In the previous sections, the d-dimensional normal distribution played an important role in deriving the multivariate ABPN distribution. In this section, we extend those results by considering the elliptical family of distributions. For the case of a random variable (one-dimensional case), elliptical distributions correspond to all symmetric distributions in $R$. Specifically, a random variable X follows an symmetric distribution if its PDF is given by
$f X ( x ) = c η g ( x − ξ ) 2 η 2 , x ∈ R ,$
where $g ( u )$ (for $u ≥ 0$) is a nonnegative real function and corresponds to the kernel of the PDF, and c is a normalizing constant such that $f X ( x )$ is a PDF. We use the notation EC$( ξ , η ; g )$ according to (4). In general, $ξ ∈ R$ is the location parameter and coincides with the mean if the first moment of the distribution exists, and $η > 0$ is the scale parameter.
First, we shall introduce a multivariate elliptical PN family of distributions. A random vector follows a multivariate elliptical PN distribution if its joint PDF is given by
$φ F ( x ; α ) = ∏ j = 1 d α j c α j f j ( x j ) F j ( | x j | ) α j − 1 , x ∈ R d ,$
where $f j$ is defined in (4), and $F j$ corresponds to its CDF for $j = 1 , ⋯ , d$. Clearly, since $f j$ belongs to the elliptical family, then it can be easily shown that $φ F ( − x ; α ) = φ F ( x ; α )$ so that $φ F$ is a symmetric (around zero) PDF. We use the notation MES$F ( α )$ to refer to this distribution. From Lemma 1 of Gupta and Chang [24], we have the following generalization.
Proposition 3.
Let $Y = ( Y 1 , … , Y d ) ′$ be a random vector with joint PDF $φ F$, and $Z = ( Z 1 , … , Z d ) ′$ be a random vector with absolutely continuous CDF G. Then
$φ G F ( x ; α , β ) = 2 ∏ j = 1 d α j c α j f j ( x j ) F j ( | x j | ) α j − 1 G β ′ x , x ∈ R d ,$
is a joint PDF of a random vector $X = ( X 1 , … , X d ) ′$ for any $β ∈ R d$.
Proof.
We will use the methodology given in Gupta and Chang [24]. Given that $φ F ( x ; α ) ≥ 0$ and $G β ′ x ≥ 0$ for any $x ∈ R d$, it then follows that $φ G F ( x ; α , β ) ≥ 0$. Let
$h ( α , β ) = ∫ − ∞ ∞ … ∫ − ∞ ∞ 2 ∏ j = 1 d α j c α j f j ( x j ) F j ( | x j | ) α j − 1 G β ′ x ∏ j = 1 d d x j .$
Lebesgue’s dominated convergence theorem implies that
$∂ h α , β ∂ β k = ∫ − ∞ ∞ … ∫ − ∞ ∞ 2 ∏ j = 1 d α j c α j f j ( x j ) F j ( | x j | ) α j − 1 ∂ G β ′ x ∂ β k ∏ j = 1 d d x j = ∫ − ∞ ∞ … ∫ − ∞ ∞ 2 x k ∏ j = 1 d α j c α j f j ( x j ) F j ( | x j | ) α j − 1 G ′ β ′ x ∏ j = 1 d d x j .$
Moreover, given that $2 x k ∏ j = 1 d α j c α j f j ( x j ) F j ( | x j | ) α j − 1 G ′ β ′ x$ is an odd function, we have
$∂ h α , β ∂ β k = 0 , j = 1 , ⋯ , d .$
Therefore, it allows us to conclude that $h α , β$ is a constant and, hence, given that $h α , 0 = 1$, we have that $φ G F ( x ; α , β )$ is a joint PDF. □
The new multivariate distribution defined in Proposition 3 will be denoted by MESS$G F ( α , β )$. We have the following proposition.
Proposition 4.
Let $X = ( X 1 , … , X d ) ′ ∼$ MESS$G F ( α , β )$. We have that
1.
If $β = 0 d$, then $X = ( X 1 , … , X d ) ′ ∼$ MES$F ( α )$.
2.
$− X ∼$ MESS$G F ( α , − β )$.
3.
For $d = 2 ,$ regression functions are of linear type
$E ( X j | X j ′ = x j ′ ) = M j ( x j ) N j ′ ( x j ′ ) ,$
where
$M j ( x j ) = ∫ − ∞ ∞ ∫ − ∞ ∞ x j f j ( x j ) F j ( | x j | ) α j − 1 G β 1 x 1 + β 2 x 2 d x 1 d x 2 ,$
$N j ( x j ) = ∫ − ∞ ∞ f j ( x j ) F j ( | x j | ) α j − 1 G β 1 x 1 + β 2 x 2 d x 1 d x 2 .$
The product moment of $X = ( X 1 , … , X d ) ′ ∼$ MESS$G F ( α , β )$ are provided in the next proposition.
Proposition 5.
If $X = ( X 1 , … , X d ) ′ ∼$ MESS$G F ( α , β )$, then
$E [ X 1 r 1 X 2 r 2 … X d r d ] = E [ Y 1 r 1 Y 2 r 2 … Y d r d ] ,$
where $Y ∼ M E S F ( α )$.
Proof.
Let $X = ( X 1 , ⋯ , X d ) ′ ∼$ MESS$G F ( α , β )$ and $t = ( t 1 , ⋯ , t d ) ∈ R d$. Also, let $ψ X ( t )$ be the characteristic function of $X$. We have that
$ψ X ( t ) + ψ X ( − t ) = 2 ψ Y ( t ) = 2 ∏ j = 1 d ψ Y j ( t j ) ,$
where $Y j ∼ α j c α j f j ( y j ) ( F ( | y j | ) ) α j − 1$ is independent of $β$. Thus,
$∂ m ψ Y ( t ) ∂ t 1 r 1 ∂ t 2 r 2 … ∂ t d r d < ∞ ,$
for $m = r 1 + r 2 + ⋯ + r d$ even, so that
$E [ X 1 r 1 X 2 r 2 … X d r d ] = E [ Y 1 r 1 Y 2 r 2 … Y d r d ] .$
The previous result implies that even moments of $X$ do not depend on $β$, so that
where $E ( Y j + r j )$ is the moment of the positive part of the variable $Y j$ as defined before. □

6.1. ML Estimation

Let $x 1 = ( x 11 , … , x 1 d ) ′$, , $x n = ( x n 1 , … , x n d ) ′$ be an observed sample of size n from $X = ( X 1 , … , X d ) ′ ∼ M E S S G F ( ξ , η , α , β )$. The log-likelihood function takes the form
$ℓ ( θ ) = n ∑ j = 1 d ( log ( α j ) + log ( c α j ) − log ( η j ) ) + ∑ i = 1 n ∑ j = 1 d log ( f j ( z i j ) ) + ∑ i = 1 n ∑ j = 1 d ( α j − 1 ) log ( F j ( | z i j | ) ) + ∑ i = 1 n log G ∑ j = 1 d β j z i j ,$
where $z i j = ( x i j − ξ j ) / η j$ for $i = 1 , … , n$ and $j = 1 , … , d$, and $θ = ( ξ ′ , η ′ , α ′ , β ′ ) ′$. The first-order partial derivatives are given by
$∂ ℓ ( θ ) ∂ ξ j = − 1 η j ∑ i = 1 n f j ′ ( z i j ) f j ( z i j ) + ( α j − 1 ) ∑ i = 1 n sgn ( z i j ) f j ( | z i j | ) F j ( | z i j | ) + β j ∑ i = 1 n G ′ ( z 1 i j ) G ( z 1 i j ) ,$
$∂ ℓ ( θ ) ∂ η j = − 1 η j { n + ∑ i = 1 n z i j f j ′ ( z i j ) f j ( z i j ) + ( α j − 1 ) ∑ i = 1 n | z i j | f j ( | z i j | ) F j ( | z i j | ) + β j ∑ i = 1 n z i j G ′ ( z 1 i j ) G ( z 1 i j ) } ,$
$∂ ℓ ( θ ) ∂ α j = n α j − n log 2 2 α j − 1 + ∑ i = 1 n log ( F j ( | z i j | ) ) , ∂ ℓ ( θ ) ∂ β j = ∑ i = 1 n z i j G ′ ( z 1 i j ) G ( z 1 i j ) ,$
where $G ′ ( u ) = d G ( u ) / d u$.

6.2. Expected Information Matrix

Let $σ ξ j ξ j$, $σ ξ j ξ j ′$, , $σ β j β j$ be the elements of the expected information matrix. Also, define $U 1 = f j ′ ( Z j ) / f j ( Z j )$, $U 2 = f j ″ ( Z j ) / f j ( Z j )$, $W 1 = f j ( | Z j | ) / F j ( | Z j | )$, $W 2 = f j ′ ( | Z j | ) / F j ( | Z j | )$, $K 1 = G ′ ( Z 1 j ) / G ( Z 1 j )$ and $K 2 = G ″ ( Z 1 j ) / G ( Z 1 j )$, where $Z 1 j = ∑ j = 1 d β j Z j$, $f ′ ( u ) = d f ( u ) / d u$ and $f ″ ( u ) = d 2 f ( u ) / d u 2$. We have
$σ ξ j ξ j = { E ( U 1 2 ) − E ( U 2 ) + ( α j − 1 ) [ E ( W 1 2 ) − E ( W 2 ) ] + β j 2 [ E ( K 1 2 ) − E ( K 2 ) ] } / η j 2 , = { ( a 02 − a 0 ) + ( α j − 1 ) ( b 02 − b 0 ) + β j 2 ( c 02 − c 0 ) } / η j 2 ,$
$σ ξ j ξ j ′ = β j β j ′ ( c 02 − c 0 ) / η j η j ′ ,$
$σ ξ j η j = E ( Z j U 1 2 ) − E ( Z j U 2 ) − E ( U 1 ) − β j E ( K 1 ) + β j 2 [ E ( Z j K 1 ) − E ( Z j K 2 ) ] + ( α j − 1 ) [ E ( Z j W 1 ) − E ( Z j W 2 ) − E ( sgn ( Z j ) W 1 ) ] / η j 2 , = { ( a 12 − a 1 − a 01 ) + ( α j − 1 ) ( b 12 − b 1 − d 0 ) − β j c 01 + β j 2 ( c 12 − c 1 ) } / η j 2 ,$
$σ ξ j η j ′ = β j β j ′ ( c 12 − c 1 ) / η j η j ′ ,$
$σ ξ j α j = { E ( sgn ( Z j ) W 1 ) } / η j = d 0 / η j ,$
$σ ξ j β j = { E ( K 1 ) − β j [ E ( Z j K 1 2 ) − E ( Z j K 2 ) ] } / η j = { c 01 − β j ( c 12 − c 1 ) } / η j ,$
$σ ξ j β j ′ = − β j { E ( Z j ′ K 1 2 ) − E ( Z j ′ K 2 ) } / η j ,$
$σ η j η j = { − 1 − 2 E ( Z j U 1 ) + E ( Z j 2 U 1 2 ) − E ( Z j 2 U 2 ) − 2 β j E ( Z j K 1 ) + β j 2 E ( Z j 2 K 1 2 ) − β j 2 E ( Z j 2 K 2 ) + ( α j − 1 ) [ E ( Z j 2 W 1 2 ) − E ( Z j 2 W 2 ) − 2 E ( | Z j | W 1 ) ] } / η j 2 , = { ( a 22 − 1 − 2 a 11 − a 2 ) + ( α j − 1 ) ( b 22 − b 2 − 2 d 1 ) − 2 β j c 11 + β j 2 ( c 22 − c 2 ) } / η j 2 ,$
$σ η j η j ′ = β j β j ′ [ E ( Z j Z j ′ K 1 2 ) − E ( Z j Z j ′ K 2 ) ] / η j η j ′ ,$
$σ η j α j = E ( | Z j | W 1 ) / η j = d 1 / η j ,$
$σ η j β j = { E ( Z j K 1 ) − β j [ E ( Z j 2 K 1 2 ) − E ( Z j 2 K 2 ) ] } / η j = { c 11 − β j ( c 22 − c 2 ) } / η j ,$
$σ η j β j ′ = − β j [ E ( Z j Z j ′ K 1 2 ) − E ( Z j Z j ′ K 2 ) ] / η j ,$
$σ α j α j = 1 / α j − ( log 2 ) 2 2 α j / ( 2 α j − 1 ) 2 ,$
$σ β j β j = E ( Z j 2 K 1 ) − E ( Z j 2 K 2 ) = c 22 − c 2 ,$
$σ β j β j ′ = E ( Z j Z j ′ K 1 ) − E ( Z j Z j ′ K 2 ) ,$
$σ ξ j α j ′ = σ η j α j ′ = σ α j α j ′ = σ α j β j = σ α j β j ′ = 0 ,$
where $a r s = E ( Z j r ( f j ′ ( Z j ) / f j ( Z j ) ) s )$, $a r = E ( Z j r f j ″ ( Z j ) / f j ( Z j ) )$, $b r s = E ( Z j r ( f j ( | Z j | ) / F j ( | Z j | ) ) s )$, $b r = E ( Z j r f j ′ ( | Z j | ) / F j ( | Z j | ) )$, $c r s = E ( Z j r ( G j ′ ( Z 1 j ) / G j ( Z 1 j ) ) s )$, $c r = E ( Z j r G j ″ ( Z 1 j ) / G j ( Z 1 j ) )$ and $d r = E ( sgn ( Z j ) Z j r f j ( | Z j | ) / F j ( | Z j | ) )$ for $r = 0 , 1 , 2$ and $s = 1 , 2$.

7. Concluding Remarks

The univariate power-normal distribution has been quite useful in the modeling of many types of real data. On the other hand, extensions of the univariate power-normal distribution to the multivariate setup have been little explored in the statistic literature. We have proposed new multivariate power-normal distributions, which are quite simple and can be useful in the modeling of multivariate data. The new multivariate power-normal distributions are absolutely continuous distributions. The joint probability density functions of the new multivariate power-normal distributions do not involve any complicated function and, hence, they can be computed easily. By employing the frequentist approach, the estimation of the multivariate power-normal distribution parameters is conducted by the maximum likelihood method. We also provide closed-form expressions for the observed and expected Fisher information matrices. We illustrate the methodology developed in this paper by means of an application to real data. We verify through the real data application that the proposed bivariate power-normal distribution was superior to the well-known bivariate skew-normal distribution (Azzalini and Dalla Valle [15]), as well as conditional bivariate skew-normal distribution (Arnold et al. [14]). Finally, it is worth stressing that the formulas related with the multivariate power-normal distributions are manageable (such as log-likelihood function, score function, and observed and expected Fisher information matrices, etc), and with the use of modern computer resources and its numerical capabilities, the proposed multivariate distributions may prove to be an useful addition to the arsenal of applied statisticians. Finally, we have also introduced in this paper multivariate PN distributions by considering the elliptical family of distributions, and discussed some of its properties.

Author Contributions

Individual contributions to this article: conceptualization, G.M.-F., A.J.L., and H.S.S.; methodology, G.M.-F., A.J.L., and H.S.S.; validation, G.M.-F., A.J.L., and H.S.S.; software, G.M.-F., A.J.L., and H.S.S.; investigation, G.M.-F., A.J.L., and H.S.S.; resources, G.M.-F., A.J.L., and H.S.S.; writing–original draft preparation, G.M.-F., A.J.L., and H.S.S.; writing–review and editing, G.M.-F., A.J.L., and H.S.S.; funding acquisition, A.J.L.

Funding

This research was funded in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq/Brazil) under grant 301808/2016–3.

Acknowledgments

We thank the anonymous referees for helpful suggestions which improved the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

1. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
2. Fernandez, C.; Steel, M.F.J. On Bayesian modeling of fat tails and skewness. J. Am. Stat. Assoc. 1998, 93, 359–371. [Google Scholar]
3. Mudholkar, G.S.; Hutson, A.D. The epsilon-skew-normal distribution for analyzing near-normal data. J. Stat. Plan. Inference 2000, 83, 291–309. [Google Scholar] [CrossRef]
4. Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res. 1992, 28, 1649–1655. [Google Scholar] [CrossRef]
5. Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [Google Scholar] [CrossRef]
6. Martínez-Flórez, G.; Arnold, B.C.; Bolfarine, H.; Gómez, H.W. The multivariate alpha-power model. J. Stat. Plan. Inference 2013, 143, 1244–1255. [Google Scholar] [CrossRef]
7. Henze, N. A probabilistic representation of the skew-normal distribution. Scand. J. Stat. 1986, 13, 271–275. [Google Scholar]
8. Chiogna, M. A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution. Stat. Methods Appl. 2005, 14, 331–341. [Google Scholar] [CrossRef]
9. Pewsey, A. Problems of inference for Azzalini’s skew-normal distribution. J. Appl. Stat. 2000, 27, 859–870. [Google Scholar] [CrossRef]
10. Gómez, H.W.; Salinas, H.S.; Bolfarine, H. Generalized skew-normal models: Properties and inference. Statistics 2006, 40, 495–505. [Google Scholar] [CrossRef]
11. Gómez, H.W.; Venegas, O.; Bolfarine, H. Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 2007, 18, 395–407. [Google Scholar]
12. Gupta, R.D.; Gupta, R.C. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
13. Bolfarine, H.; Martínez-Flórez, G.; Salinas, H.S. Bimodal symmetric-asymmetric power-normal families. Commun. -Stat. –Theory Methods 2018, 47, 259–276. [Google Scholar] [CrossRef]
14. Arnold, B.C.; Castillo, E.; Sarabia, J.M. Conditionally specified multivariate skewed distributions. Sankhya A 2002, 64, 206–226. [Google Scholar]
15. Azzalini, A.; Dalla Valle, A. The multivariate skew-normal distribution. Biometrika 1996, 83, 715–726. [Google Scholar] [CrossRef]
16. Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew normal distribution. J. R. Stat. Soc. 1999, 61, 579–602. [Google Scholar] [CrossRef]
17. Gupta, A.K.; Huang, W.J. Quadratic forms in skew normal variates. J. Math. Anal. Appl. 2002, 273, 558–564. [Google Scholar] [CrossRef]
18. Gupta, A.K.; Chang, F.C.; Huang, W.J. Some skew-symmetric models. Random Oper. Stoch. Equ. 2002, 10, 133–140. [Google Scholar] [CrossRef]
19. Genton, M.G. Skew-Elliptical Distributions and Their Applications: A Journey beyond Normality; Chapman and Hall/CRC: New York, NY, USA, 2004. [Google Scholar]
20. Arellano-Valle, R.B.; Azzalini, A. On the unification of families of skew-normal distributions. Scand. J. Stat. 2006, 33, 561–574. [Google Scholar] [CrossRef]
21. Arellano-Valle, R.B.; Azzalini, A. The centred parametrization for the multivariate skew-normal distribution. J. Multivar. Anal. 2008, 99, 1362–1382. [Google Scholar] [CrossRef]
22. Arellano-Valle, R.B.; Genton, M.G. On fundamental skew distributions. J. Multivar. Anal. 2005, 96, 93–116. [Google Scholar] [CrossRef]
23. Gupta, A.K.; Chen, J.T. A class of multivariate skew-normal models. Ann. Inst. Stat. Math. 2004, 56, 305–315. [Google Scholar] [CrossRef]
24. Gupta, A.K.; Chang, F.C. Multivariate skew-symmetric distributions. Appl. Math. Lett. 2003, 16, 643–646. [Google Scholar] [CrossRef]
25. Azzalini, A.; Capitanio, A. The Skew-Normal and Related Families; IMS Monographs Series; Cambridge University Press: UK, London, 2014. [Google Scholar]
26. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
27. Cox, D.R.; Hinkley, D.V. Theoretical Statistics; Chapman and Hall: London, UK, 1974. [Google Scholar]
28. Cox, D.R.; Reid, N. Parameter orthogonality and approximate conditional inference. J. R. Stat. Soc. B 1987, 40, 1–39. [Google Scholar] [CrossRef]
29. Pesta, M. Total least squares and bootstrapping with application in calibration. Statistics 2013, 47, 966–991. [Google Scholar] [CrossRef]
30. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
31. Villasenor, J.A.; González, E. A generalization of Shapiro-Wilk’s test for multivariate normality. Commun. -Stat. –Theory Methods 2009, 38, 1870–1883. [Google Scholar] [CrossRef]
32. Mardia, K.V. Measures of multivariate skewness and kurtosis with applications. Biometrika 1970, 57, 519–530. [Google Scholar] [CrossRef]
33. Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: New York, NY, USA, 2002. [Google Scholar]
Figure 1. Contour plots for some parameter values $( α 1 , α 2 , β 1 , β 2 )$: $( 0.75 , 0.25 , − 0.85 , 2.0 )$ (left), $( 0.75 , 3.75 , 1.85 , 2.0 )$ (middle), and $( 1.5 , 2.25 , 1.0 , 2.5 )$ (right).
Figure 1. Contour plots for some parameter values $( α 1 , α 2 , β 1 , β 2 )$: $( 0.75 , 0.25 , − 0.85 , 2.0 )$ (left), $( 0.75 , 3.75 , 1.85 , 2.0 )$ (middle), and $( 1.5 , 2.25 , 1.0 , 2.5 )$ (right).
Figure 2. Scatter plot of real data and contour level of the fitted PDFs: (left) bivariate SN distribution, (middle) conditional SN distribution, and (right) bivariate ABPN distribution.
Figure 2. Scatter plot of real data and contour level of the fitted PDFs: (left) bivariate SN distribution, (middle) conditional SN distribution, and (right) bivariate ABPN distribution.
Table 1. Empirical mean.
Table 1. Empirical mean.
$α ^ 1$$α ^ 2$
$α 1$$α 2$$n = 50$$n = 250$$n = 500$$n = 1500$$n = 50$$n = 250$$n = 500$$n = 1500$
0.250.59140.54160.52010.51580.52860.31170.26440.2462
0.52.50.57760.55010.53280.51802.55862.51682.48982.4932
4.750.59320.55710.53860.52264.88804.78704.76584.7476
0.251.52051.51721.51231.50480.53860.31880.26750.2465
1.52.51.55511.51561.50831.50582.53992.51512.50672.5046
4.751.54921.52001.51311.51194.87144.76354.76014.7526
0.252.52952.52122.51082.50930.54390.29210.27400.2497
2.52.52.52342.52072.51742.50782.52582.50712.50632.5054
4.752.53002.48952.50272.50084.76904.76724.76084.7467
Table 2. Squared root of the mean squared error ($M S E$).
Table 2. Squared root of the mean squared error ($M S E$).
$α ^ 1$               $α ^ 2$
$α 1$$α 2$$n = 50$$n = 250$$n = 500$$n = 1500$$n = 50$$n = 250$$n = 500$$n = 1500$
0.250.58750.29200.19510.12120.52040.26210.19670.1274
0.52.50.57880.29100.20100.11960.80490.34930.23660.1381
4.750.56890.28460.18740.10910.89850.41810.28270.1633
0.250.72800.30540.22720.12820.50660.26320.19330.1254
1.52.50.68110.31120.21460.12440.79900.35350.24530.1363
4.750.73760.30480.20870.12200.95710.41460.28670.1655
0.250.78140.32800.23080.12910.49120.25600.19940.1269
2.52.50.73820.33870.23220.13390.78120.33730.24450.1437
4.750.79110.32630.23480.13400.96700.40770.29000.1685
Table 3. Correlation for the bivariate ABPN distribution.
Table 3. Correlation for the bivariate ABPN distribution.
$β 1 = − 1 . 5$$β 1 = 0$$β 1 = 1 . 5$
$α 1$$α 2$$β 2 = − 2 . 5$$β 2 = 0$$β 2 = 2 . 5$$β 2 = − 2 . 5$$β 2 = 0$$β 2 = 2 . 5$$β 2 = − 2 . 5$$β 2 = 0$$β 2 = 2 . 5$
0.25 $− 0.348$$0.000$0.350$− 0.001$$2 × 10 − 4$$0.000$0.347$0.000$$− 0.348$
1.75 $− 0.349$$0.001$0.349$0.000$$2 × 10 − 4$$− 0.001$0.345$0.001$$− 0.348$
3.50.75$− 0.331$$0.000$0.333$− 0.002$$4 × 10 − 4$$0.002$0.331$0.000$$− 0.331$
7.0 $− 0.284$$0.001$0.285$0.000$$0 × 10 0$$0.003$0.280$− 0.000$$− 0.279$
15 $− 0.189$$0.002$0.186$− 0.000$$0 × 10 0$$− 0.002$0.183$0.002$$− 0.186$
0.25 $− 0.384$$5 × 10 − 4$0.385$− 0.000$$2 × 10 − 4$$− 0.001$0.382$0.001$$− 0.382$
1.75 $− 0.383$$1 × 10 − 4$0.383$0.000$$0 × 10 0$$− 0.002$0.385$− 0.001$$− 0.382$
3.52.5$− 0.369$$1 × 10 − 4$0.367$0.000$$3 × 10 − 4$$0.001$0.367$− 0.001$$− 0.368$
7.0 $− 0.319$$1 × 10 − 3$0.321$− 0.001$$4 × 10 − 5$$0.002$0.316$− 0.001$$− 0.316$
15 $− 0.223$$9 × 10 − 4$0.220$− 0.001$$6 × 10 − 4$$− 0.002$0.221$0.002$$− 0.219$
0.25 $− 0.418$$0.001$0.419$− 0.000$$4 × 10 − 4$$− 0.000$0.414$− 0.002$$− 0.417$
1.75 $− 0.417$$− 0.001$0.419$− 0.000$$3 × 10 − 4$$− 0.001$0.418$0.001$$− 0.418$
3.55.0$− 0.401$$0.000$0.403$− 0.001$$1 × 10 − 4$$− 0.001$0.402$0.003$$− 0.405$
7.0 $− 0.352$$0.001$0.354$− 0.002$$4 × 10 − 4$$− 0.001$0.352$− 0.001$$− 0.353$
15 $− 0.255$$0.002$0.258$− 0.000$$6 × 10 − 4$$− 0.004$0.257$0.001$$− 0.253$
Table 4. Some measures for the fitted models.
Table 4. Some measures for the fitted models.
Link Model$Y m$ER$m$$w m$
SN3.766.550.131
Conditional SN8.3063.430.013
ABPN0.001.000.856