Bivariate Power-Skew-Elliptical Distribution

: In this article, we introduce a power-skew-elliptical (PSE) distribution in the bivariate setting. The new bivariate model arises in the context of conditionally speciﬁed distributions. The proposed bivariate model is an absolutely continuous distribution whose marginals are univariate PSE distributions. The special case of the bivariate power-skew-normal (BPSN) distribution is studied in details. General properties of the BPSN distribution are derived and the estimation of the unknown parameters by maximum pseudo-likelihood is discussed. Further, a sandwich type matrix, which is a consistent estimator for the asymptotic covariance matrix of the maximum likelihood (ML) estimator is determined. Two applications for real data of the proposed bivariate distribution is provided for illustrative purposes.


Introduction
Family of distributions that unify the main characteristics of other families are not very common in distributions theory. In this sense, an asymmetric family widely used in many situations and different areas of knowledge is the skew-normal (SN) distribution of [1], that is characterized because it has a wide range of asymmetry. Another well-known family in this same area is the fractional order statistical distribution, also known as exponentiated distribution or power-normal distribution introduced by [2], which is characterized by having a wide range of kurtosis. The unification of these two families was studied by [3] and was called power-skew-normal distribution. The resulting family contains, such as special cases the normal, skew-normal and power-normal distributions, and the ranges of asymmetry and kurtosis are greater than any of these three families of distributions. This type of unification of distribution families is important in the literature of distribution theory because it introduces great flexibility to the resulting models.
This model was extended to the case of elliptical distributions and was considered the case of the Birnbaum-Saunders (BS) family by [4], resulting in a large unification of useful distributions for example, to model the lifetime in survival analysis or applications in reliability theory. The model is called Birnbaum-Saunders power skew elliptical (BSPSE), and it contains as special cases as a large number of extensions of the lifetime Birnbaum-Saunders model, for both elliptical distributions and skew-elliptical families. In the univariate case, this type of results suggests that multivariate distributions constructed of unifications of distributions, it will induce multivariate models with great flexibility.
In this paper, we study a new family of bivariate distributions whose conditional densities follow a power-skew-elliptical distribution, which extends the power-skew normal model to the case of bivariate distributions, becoming a new alternative to the existing asymmetric bivariate models in literature such as the bivariate skew-normal distribution by [5] and the conditionally specified bivariate skewed distribution of [6].
A brief description of the some elliptical distributions is presented below.

Elliptical Distributions
A continuous one-dimensional random variable is said to have an elliptical distribution, if its distribution function is symmetric with support in the real number set. Specifically, a random variable X has a symmetric distribution if its probability density function (PDF) is given by for some non-negative function g(u), with u > 0 and corresponds to the kernel of the PDF such that ∞ 0 u − 1 2 g(u)du = 1/c, where z = (x − ξ)/η and c is a normalizing constant. The function g(·) is known as the density-generating function. An elliptically distributed random variable X with location and scale parameters ξ and η, respectively, and density-generating function, say g is denoted by X ∼ EC(ξ, η; g). If ξ = 0 and η = 1, then X has spherical distribution, which is denoted as X ∼ EC(0, 1; g).

Skew-Elliptical Distribution
An extension of the elliptical model to the asymmetric case is the standard elliptical asymmetric (skew-elliptical) model defined as where f (·) is given in Equation (1), F (·) is its respective cumulative distribution function (CDF), and λ is an asymmetry parameter. We use the notation Y ∼ SE(0, 1; g, λ). The CDF for this model is given by Skew-elliptical distributions are discussed in [12][13][14][15][16], among others. To follow we present some distributions belonging to this family.

Skew-Normal Distribution
A particular case of model in Equation (3) is the SN distribution introduced by [1], which is obtained by letting f (·) = φ(·) and F (·) = Φ(·), that is, the PDF and CDF of the standard normal distribution. The PDF and CDF of the SN model are given by and respectively, where T(·, ·) is the Owen's function, see [17] for more details.

Skew-Student-t Distribution
The skew-Student-t (SST) distribution (or skew-Pearson type IV) has the PDF given by where v > 0 represents the degrees of freedom and F 2−1 (·, ·, ·, ·) is the hypergeometric function, see [18].

Skew-Cauchy Distribution
A continuous random variable X following skew-Cauchy (SC) distribution has PDF given by 1.

Skew-Logistic Distribution
The PDF of the skew-logistic (SLOG) distribution is given by

Skew-Laplace Distribution
The skew-Laplace (SL) distribution has PDF given by

Power-Skew-Elliptical Distribution
An alternative to the SN distribution of [1], was studied by [2] by introducing the fractional order statistical model, also known as alpha-power (AP) model, which has PDF given by where H(·) is an absolutely continuous CDF with PDF h(·), and α > 0 is a parameter that controls the distributional shape. The case H(·) = Φ(·) is called the power-normal (PN) distribution and has PDF given by The PN model is denoted by Z ∼ PN(α) and is considered an alternative distribution for modeling data with asymmetry and kurtosis above (or below) the expected for the normal distribution. If PDF h(·) in model (10) has the form as in Equation (2), then the model is called PSE and its PDF is given by We will use the notation Z ∼ PSE(0, 1; g, λ, α). The case of this family of distributions for h(·; λ) = f SN (·, λ) and H(·; λ) = F SN (·, λ) was studied by [3] and it is called power-skew-normal distribution which is denoted by PSN(λ, α). Some contributions to this family have been made by [19][20][21][22], among other.
Some additional works on distributions include those of [23,24], which the possibility of applying the analytical expressions for the calculation of the correct detection probability of the signal time window at synchronization has been proved.
The rest of the paper is organized as follows: Section 2 we introduce the new bivariate power-skew-elliptical family of distributions, several properties are derived and we consider the ML method for estimating the model parameters. In Section 3, we study the particular case of the bivariate BPSN model. ML estimation of the model is discussed and a reparameterization of the BPSN model is presented and we derive the information matrix. In Section 4, two applications is presented illustrating the good performance of the approaches developed in the paper.

Bivariate Power-Skew-Elliptical Distribution
In this section, we extend the PSE model to the bivariate case, this new model is a great extension because it is the bivariate unification of two families of distributions, on the one hand, the skew-elliptical family and on the other hand, the alpha-power family. The unification will generate a distribution with great flexibility in both asymmetry and kurtosis.
For the construction of bivariate power-skew-elliptical (BPSE) family of distributions, we will use the approach discussed in [25] which is based on conditional distributions. According to [25] a two-dimensional random vector (X 1 , X 2 ) is conditionally specified, if for any random variable X 2 , the random variable X 1 | X 2 = x 2 is a member of a parametric family. Suppose that the joint PSE distribution function H BPSE (x 1 , x 2 ), of the random vector (X 1 , X 2 ) is such that, the conditional distribution of X 1 given X 2 = x 2 and the conditional distribution of X 2 given X 1 = x 1 are members of the PSE family of distributions with respect to a Lebesgue measure. We denote this by writing and where ω, τ are positive dependence functions which are to be determined. In such case, we have conditionals in a given exponential family and we can identify the corresponding joint density. We can argue as follows. If h X 1 (x 1 ) and h X 2 (x 2 ) are marginal densities for a joint PSE density h BPSE (x 1 , x 2 ) with conditional densities given by Equations (13) and (14), then it follows that Following [26] the solutions for dependence functions are given by and with α 1 , α 2 , positive real constants and α 12 = α 21 ≥ 0. Then, by using theorems in [27] and ( [28], Chap. 2), we have that where λ 1 , λ 2 ∈ R, the constants α 1 , α 2 > 0 and α 12 ≥ 0 in order to guarantee R R h(x 1 , x 2 )dx 1 dx 2 < ∞, and c(λ, α) is a normalizing constant with λ = (λ 1 , λ 2 ) and α = (α 1 , α 2 , α 12 ) . The independence case, where the joint distribution is the product of two PSE densities, is followed by taking α 12 Different conditional bivariate skew-elliptical distributions can be obtained from the generating function g(·), such as presented above, skew-normal, skew-Cauchy, skew-Student-t, skew-logistic and skew-Laplace, among others. This new family, which we will denote by BPSE g generates a large number of bivariate distributions according to generating function g(·), in addition to those already mentioned, one could also talk about the distributions: special case, type Kotz, Bessel, where H −1 i (·; ·) is the inverse function of the CDF H i (·; ·), i = 1, 2. To study the correlation between X 1 and X 2 , we can compute the measure Moreover, we can consider as an estimator of ρ X 1 ,X 2 .

Statistical Inference for the Bpse Model
The normalization constant c(λ, α) in the joint PDF h(x 1 , x 2 ) makes difficult the parameters estimation by maximizing the likelihood function. As alternative, we will follow the proposal of [29] for the parameters estimation of multivariate distributions, that is, we will maximize the pseudo-likelihood function. The pseudo-likelihood function is defined as the product of conditional functions. In this case, as in the ML method, the logarithm of the product of conditional distributions is maximized and this eliminates the logarithm of the normalization constant c(λ, α) within the estimation process, which, as in this case will contain multiple integrals in its structure. Another important characteristic of this estimation process is that the maximum pseudo-likelihood estimators vector of the model parameters is consistent and converges asymptotically to a multivariate normal distribution.
The log-pseudo-likelihood function is defined as the logarithm of the pseudo-likelihood function and for BPSE model it is expressed by The pseudo-score function is defined as the partial derivatives of the log-pseudo-likelihood function with respect to each of the parameters model, this is denoted by In accordance with [30], the pseudo likelihood estimator β of β is consistent, asymptotically normally distributed with covariance matrix given by Cheng, C. and Riu, J. [31] consider a sandwich type estimator, which is consistent for the asymptotic covariance matrix to estimate Σ( β) which is given by , is the score vector for the pseudo-likelihood function. The structure of the Γ and Ψ matrices are going to depend on the PDF in the BPSE model, specifically of the g(·) generating function. The non-singularity of Σ must be treated in each particular case, as well as its possible solution in case of singularity of this matrix.
For estimating the covariance matrix, we have from [31] that, thte Ψ(·) matrix will be estimated from the pseudo-score function given above, while the Γ(·) component of the Σ(·) matrix, we will write it in the form Γ = (γ β j β j ) = − 1 n K where K = (κ β j β j ) is a matrix of second derivatives of the pseudo-likelihood function, with respect to the parameters of the model, the elements κ β j β j of this matrix are presented in the Appendix A.

Reparameterization for the Bpsn Model
As well known in the literature of distributions theory, the SN model has a singular information matrix for λ = 0, however, it has been proposed to perform a reparameterization of the model parameters, [32], this problem is presented by several extensions of the SN model, including for example, the power-skew-normal model whose information matrix is singular for λ = 0 and α = 1, here, [33] present a reparameterization of the models parameters for this case.
Another solution to the problem of the singularity of the SN model when λ = 0 was presented by [1], which consists of a representation of the form , where µ ∈ R and σ > 0 are parameters of the random variable Y, and Z ∼ SN(λ). This representation is called centered parametrization, since E[Y] = µ and Var[Y] = σ 2 . The new representation has parameters vector θ = (µ, σ, γ 1 ) , where −0.9953 ≤ γ 1 ≤ 0.9953 represents the asymmetry coefficient of the random variable Y. Under this representation, the information matrix of the new parameters vector turns out to be non-singular. Thus, the information matrix is written in the form DI λ D where I λ is the information matrix of the model with parameters vector (ξ, η, λ) and D is a matrix of derivatives of the parameters vector (ξ, η, λ) with respect to the parameters vector θ. Under this reparameterization, we have the relationship with b = 2 π and c = {2/(4 − π)} 1/3 . Also it has that, when λ → 0, the information matrix converges to the diagonal matrix Σ c = diag(σ 2 , σ 2 /2, 6). This guarantees the existence and uniqueness of the MLE of ξ and η, for each fixed value of λ, see [1].
The BPSN model also inherits the singularity problem in the matrix Γ(·) for values close to λ 1 = λ 2 = 0 and α 1 = α 2 = 1, specifically, for values close λ 1 = 0, the columns corresponding to the elements k ξ 1 β j and k λ 1 β j , where β j is related to the rest of the parameters, are linearly dependent. The same way, it happens for values close λ 2 = 0. This situation also occurs in the pseudo-score function, leading to problems to guarantee the existence and uniqueness of the pseudo-estimated values. This leads to the non-existence of the inverse of the Γ(·) matrix and therefore, the covariance matrix Σ(·) can not be calculated. Following for j = 1, 2, where Z j ∼ SN(λ j ), then we arrive to representation of the BPSN with parameters vector θ = (µ 1 , µ 2 , σ 1 , σ 2 , γ j1 , γ j2 , α 1 , α 2 , α 12 ) where, for j = 1, 2, it has −0.9953 ≤ γ j1 ≤ 0.9953. As showed by [1], this representation can be written in the form for j = 1, 2. We denote it by SN c (µ j , σ j , γ j1 ). Here, the new bivariate centered power-skew-normal model (BPSN c ) is defined just as the PDF defined by model in Equation (28), where ξ j , η j and λ j are defined as in Equation (31). Given the relationship in Equation (31), ML estimates for the vector θ = (µ 1 , µ 2 , σ 1 , σ 2 , γ j1 , γ j2 , α 1 , α 2 , α 12 ) can be obtained from the estimates of the original model, that is, the estimates of maximum pseudo-likelihood for α 1 , α 2 and α 12 are the same as the model without reparametrizating, while for the parameters subvector (µ 1 , µ 2 , σ 1 , σ 2 , γ j1 , γ j2 ) , it can be obtained by using the inverse relationships of (31), that is, from: where δ j = λ j / 1 + λ 2 j . From (32), if λ j ≈ 0, then γ j1 ≈ 0, where γ j1 corresponds to the asymmetry coefficient of the random variable Y j . In this way, it is important to analyze the magnitude of the sample asymmetry coefficient of each variable.
The covariance matrix of [31] is obtained in the same way as in BPSN model, however, it can be demonstrated that the covariance matrix in the BPSN c model lets herself be written as where D = ∂β ∂θ .
It can be shown that D is a block-diagonal matrix, given by where I 3 is an identity matrix of dimension 3 × 3, and D j for j = 1, 2 is the derivative matrix of the parameter vector (ξ j , η j , λ j ) with respect to the vector (µ j , σ j , γ 1j ) . The elements of this matrix can be found in their general form in ( [34], p. 68). When γ 1j → 0, then D j Γ(β j )D j tends to the diagonal matrix diag(σ 2 j , σ 2 j /2, 6) which is non-singular, see [1,34]. According to [1], rewriting β = (β 1 , β 2 , α) , where β j = (ξ j , η j , λ j ) , it can be shown that the D j Γ(β j )D j sub matrices for j = 1, 2, of the DΓ(β)D matrix are non-singular and therefore, the rows of the DΓ(β)D matrix are linearly independent, that is, this matrix is invertible, which guarantees the existence of the Σ c ( θ) matrix. The estimator of the covariance matrix Σ c ( θ) is obtained by replacing β and θ by their respective estimators.

Illustration 1
We consider an illustration where we study the fit of the BPSN model for a data set studied in [35]. We pooled together, the 50 Iris-setosa data points, the 50 Iris-versicolor data points and the 50 Iris-virginica data points, to get a total sample size of n = 150. Descriptive statistics for the data set are presented in Table 2. Quantities √ b 1 and b 2 correspond to sample asymmetry and kurtosis coefficients.  Table 2 shows that variables x 1 and x 2 present high asymmetry, so a BPSN model could fit this data set. Under normality assumption, hypothesis tests for the asymmetry coefficient of x 1 and x 2 , (H 0 : β j1 = 0, j = 1, 2.) show test statistics 7.716 and 7.815, respectively, values well above the percentile of χ 2 (1) distribution, at level 5% whose value is 3.84, which indicates that the asymmetry in each variable is very important. Likewise, the univariate normality tests of [36] show p-values of 0.0225 and 0.0202, concluding that the distributions are asymmetric.
The bivariate normality tests of [37][38][39][40] yielded test statistics (with p-values in parentheses), of 2.8909 (0.0000), 9.3258 (0.0094) and 62.2958 (0.0000), respectively, hence, it is concluded that the bivariate observations vector does not follow a bivariate normal distribution. Thus, an asymmetric bivariate distribution, such as the BPSN model may be useful to fit this data set. Hence, the bivariate normal distribution is not a tenable model for the data under study, and an alternative model that is able to incorporate some degree of asymmetry would probably fit the data better. We fitted the conditional bivariate skew-normal model (see, [6]), the bivariate power-normal (BPN) model and the BPSN model. To compare fitted model, we make use of the Akaike Information Criterion (AIC), by [41] and the corrected AIC (CAIC) [42]. These measures are defined as follows We used the optim fuction of statistical package [43] for fitting the bivariate model. To choose the initial values in the iterative estimation process, for α 1 , α 2 , α 12 , in the BPSN model we use the transformation Y j = − ln F SN (z 1i ; λ j ) for j = 1, 2 which yields the bivariate exponential conditionals model discussed in detail by [28].
f Y 1 Y 2 (y 1 , y 2 ) = k(α 1 , α 2 , α 12 ) exp (−α 1 y 1 − α 2 y 2 − α 12 y 1 y 2 ) This transformation leads to obtaining estimates by the method of the moments of α 1 , α 2 and α 12 , which are consistent and asymptotically normal, see [28]. These estimators are given by: , α 2 = γ y 2 ( γ + I( γ − 1)) and α 12 = γ( γ − 1) y 2 ) and I = cv(y 1 )cv(y 2 ), where cor is the usual Pearson correlation between Y 1 and Y 2 , and cv(y) = S 2 y /y. For the initial points of λ 1 , λ 2 , and the location-scale parameters, we fit the univariate SN distributions by using the selm function of [43], the moment estimators of these parameters given in [44], and the values of the means and standard deviations of the bivariate normal model. Finally with the obtained values for (( ξ 1 , η 1 ), ( ξ 2 , η 2 ), λ 1 , λ 2 , α 1 , α 2 , α 12 ), the iterative estimation process is started. Table 3 presents ML estimates, AIC and CAIC values for BCSN, BPN and BPSN models, which is the one corresponding to the best (smallest AIC or CAIC) model fitting, which clearly indicates a better fit for BPSN model. Contour plots for the BPN, bivariate conditional skew-normal (BCSN) and BPSN distributions are presented in Figure 2. Initially, the BPN model is compared to the BPSN model by the hypothesis tests H 0 : (λ 1 , λ 2 ) = (0, 0) versus H 1 : (λ 1 , λ 2 ) = (0, 0). by using the likelihood ratio statistic, where L BPN (·) and L BPSN (·) are the pseudo-likelihood function of the BPN ans BPSN model, respectively. We obtain which is greater than the value of the χ 2 2,95% = 5.99. Then the BPSN model is a good alternative for fitting the data set. This result suggests the importance of the parameters λ 1 and λ 2 in the good fit of the BPSN model, the effect of these two parameters can be seen in the graph (c) of the Figure 2. The contour graphs in the Figure show that the BPSN model manages to better capture the distribution of the data set under consideration, since more points are contained within the contours of the BPSN distribution compared to the BCSN and BPN models.
Another hypothesis of special interest is the significance of the parameter vector (α 1 , α 2 , α 12 ), in this particular case there is interest in the hypothesis set H 0 : (α 1 , α 2 , α 12 ) = (1, 1, 0) versus H 1 : (α 1 , α 2 , α 12 ) = (1, 1, 0) So, under H 0 , we have the independent bivariate SN distribution for (X 1 , X 2 ). An appropriate test follows by using a statistic of type Wald which follows from the asymptotic normality of the maximum pseudo-likelihood estimator β . This statistic can be defined as where Σ 3 ( β) is a submatrix of Σ( β) corresponding to the vector (α 1 , α 2 , α 12 ), A = (0 3×6 I 3 ) and m = (1, 1, 0) , which, under the null hypothesis follows a χ 2 distribution with 3 degrees of freedom. Thus, we obtained that W n = 1229.17 with p value = 0.0000, that is, the null hypothesis is rejected, indicating that the exponentiated component is significant in the model, thus, both components of the unification, skew and power, are significant in the good fit of the BPSN model. The multivariate Kolmogorov-Smirnov test of goodness of fit proposed by [45], in special for the case of a bivariate distribution, which we denote by BKS (bivariate Kolmogorov-Smirnov), the statistic is given by where F n is the empirical distribution function of the sample, and F is some specified distribution function. When F distribution is unknown, the Kolmogorov-Smirnov statistic is defined by G n (y 1 , y 2 ) − y 1 × y 2 by using the transformation y 1 = F X 1 (x 1 ) and y 2 = F X 2 |X 1 (x 2 | x 1 ), and G n (y 2 , y 1 ) − y 2 × y 1 by using the transformation y 2 = F X 2 (x 2 ) and y 1 = F X 1 |X 2 (x 1 | x 2 ), where G n is the empirical distribution function of the sample. For the special case of the BPSN model, d n (BPSN) = max 0.05907079 , 0.07485913 = 0.07485913, which is less than 0.1464, which is the critical value of Table 1 given by [45], at level of 5%. Therefore, it is concluded that, the BPSN model fits well with the iris data set. It should be noted that, the BPSN model is compared to other models, in particular a model of the skew-normal family of [1] proposed by [6], and a model of the power-normal family of [2] studied by [26]. Our proposal better fitted the data set studied in [35]. This allows us to conclude that the BPSN model is a viable alternative to those existing in the literature when the data set presents degrees of asymmetry that are not captured by the multivariate normal model.

Illustration 2
In the second application we use data on measurements on air-pollution variables recorded at 12:00 noon in the Los Angeles area on different days, available in [46]. For this application, we use the variables x = Wind and y = NO 2 . For air pollutant concentrations, it is usually assumed that the data are uncorrelated and independent and thus do not require the diurnal or cyclic trend analysis [47]. The concentration of average air pollutants has been used in epidemiological surveillance as an indicator of the atmospheric contamination and its associated adverse effects in humans, causing diseases such as bronchitis.
The bivariate normality test of Royston returns a test statistic value of 13.55147 with p-value = 0.001141065, whereas the generalized Shapiro-Wilk test for multivariate normality returns a test statistic value of MVW = 0.94391, with p-value = 0.01166 rejecting the hypothesis of normality of the observations vector. Thus, a model like the BPSN is an alternative to fit the vector of observations.
For the hypothesis

Concluding Remarks
In this article, on the basis of conditionally specified distributions, we have introduced a new bivariate PSE distribution which is very general, quite flexible and widely applicable. The new bivariate model is an absolutely continuous bivariate distribution whose marginals are univariate PSE distributions. We have derived several properties of the bivariate PSE distribution and special attention is centered in the particular case of de bivariate PSN distribution. The estimation of the unknown parameters of the new bivariate model is approached by using the proposal of [29] by maximization of the pseudo-likelihood function and the observed information matrix is determined. LR tests for some hypotheses of interest are also considered. As remarked, the new bivariate PSN model proposed in this article can be skewed and correlated, and therefore is much more flexible than other bivariate skew models available in the literature for analysing bivariate data. This is supported in the application to real data which is verified that the new bivariate PSN model provides consistently a better fit than the bivariate model proposed by [6].