Likelihood-Based Inference for the Asymmetric Beta-Skew Alpha-Power Distribution

: This paper introduces a new family of asymmetric distributions that allows to ﬁt unimodal as well as bimodal and trimodal data sets. The model extends the normal model by introducing two parameters that control the shape and the asymmetry of the distribution. Basic properties of this new distribution are studied in detail. The problem of estimating parameters is addressed by considering the maximum likelihood method and Fisher information matrix is derived. A small Monte Carlo simulation study is conducted to examine the performance of the obtained estimators. Finally, two data set are considered to illustrate the developed methodology.


Introduction
Probability distributions for modeling data with high asymmetry and bimodality have been proposed by several authors. They stand out those distributions that have been supported under the general structure of the skew-symmetric probability density function (pdf) given by f SS (z; λ) = 2 f (z)F (λz), z ∈ R, (1) where f (·) is a symmetric pdf, F (·) is an absolutely continuous symmetric cumulative distribution function (cdf) and λ ∈ R is a shape parameter. A particular unimodal case of density function given in (1) is the skew-normal (SN) distribution [1], which has pdf given by where φ(·) and Φ(·) are the pdf and cdf of the standard normal distribution, respectively. Among the densities of bimodal type supported under general structure (1), it can be stood out the proposals of Arnold et al. [2], Gómez et al. [3] and Kim [4]. Based on the alpha-power family of distributions by Durrans [5] with pdf given by where F (·) is a cdf absolutely continuous, f = dF and α ∈ R + is a shape parameter, Bolfarine et al. [6] studied a new bimodal distribution. Other proposals for fitting data with bimodal behavior have been considered by Elal-Olivero [7], Elal-Olivero et al. [8] and Ma and Genton [9]. For example, Ma and Genton [9] introduced the family of distributions with pdf f (z) = 2φ(z)Φ w(z) , z ∈ R, (4) where φ(·) and Φ(·) are the pdf and cdf of the standard normal distribution, respectively; and w(z) is a polynomial of odd order. Elal-Olivero et al. [8] defined the bimodal elliptical skew-normal (BESN) model by multiplying the pdf of the SN distribution by a polynomial of even order resulting the pdf given by where λ ∈ R, is an asymmetry parameter, and α ≥ 0 is a shape parameter. They showed that the model fits data of bimodal type for α > 0. The case α = 0 is reduced to the SN distribution and for α = λ = 0, the standard normal distribution is obtained. An asymmetric extension of the bimodal-normal model which has pdf given by f BN (z) = z 2 φ(z), z ∈ R, (6) was proposed by Elal-Olivero [7]. This extension is called the alpha-skew-normal (ASN) distribution and its pdf is for α ∈ R. This model fits data with bimodal shape for α = 0, and for α = 0 the model is reduced to the normal distribution. Properties of the model and the statistical inference for the parameters can be seen in Elal-Olivero [7]. Recently Shafiei et al. [10] presented a generalization of the ASN model by adding a parameter β resulting a more general and flexible model. This addition allows to modeling data sets with the possibility of until four modes and model is denominated alpha-beta skew-normal (ABSN). The pdf of the ABSN model is given by where α, β ∈ R. Note that, for β = 0 the ASN model of Elal-Olivero [7] is obtained, and for α = β = 0 the normal distribution is followed. One of the great disadvantages of the widely known SN model by Azzalini [1], is that the information matrix is singular when λ = 0. This same characteristic is presented by the BESN model for the case α = λ = 0. The bimodal models by Arnold et al. [2] and Gómez et al. [3] also present the problem of singularity of the information matrix. Elal-Olivero [7] shows that ASN distribution has singular information matrix when the shape parameter α is equal to zero, and given that ABSN model of Shafiei et al. [10] contains the ASN model as a particular case, it can be shown that for α = β = 0, the ABSN model also has singular information matrix, a situation also presented by the proposal of Ma and Genton [9]. In this way, models of bimodal type that result from including polynomials of even order in the density of normal or skew-normal model; or polynomials of odd order in the argument of skew-normal model, acquire the same characteristic of the skew-normal model in its information matrix; situation that also happens with those models of bimodal type that contain this model as a special case, as it happens with the proposals of Arnold et al. [2] and Gómez et al. [3].
On the other hand, Pewsey et al. [11] studied the alpha-power (AP) model given in (3) for the special case F (·) = Φ(·), which is denominated the power-normal (PN) model and they found that α values near to one (greater and smaller than one), the information matrix is non-singular, being this an advantage in the inferential process and the asymptotic properties of the maximum likelihood estimator (MLE). Further, Martínez-Flórez et al. [12] extended the SN distribution to the alpha-power family of distributions, and they obtained a more flexible model in terms of asymmetry and kurtosis than SN and PN models. The bimodal model based on the PN distribution of Bolfarine et al. [6] also presents the same property of information matrix non-singular. Hence, given the characteristics of the information matrix of the alpha-power family of distributions, in this paper we propose an extension alpha-power of a multimodal model by using a polynomial of even degree. The introduced model is a more flexible distribution in terms of asymmetry and kurtosis than those existing in the literature and with information matrix non-singular.
The rest of this paper is organized as follows-Section 2 presents the beta-skew alpha-power model and discusses its main properties. In particular, we show how bimodality and trimodality shape are obtained. We consider a location-scale family and the inference process is carried out by using maximum likelihood method. In addition, some properties of the special case of the beta-skew-normal model are studied in details. In Section 3 a small Monte Carlo simulation is presented. In Section 4 two real data application are reported and compares it with several rival models.

The Asymmetric Beta-Skew Alpha-Power Distribution
In this section, we introduce a new multimodal asymmetric distribution by considering the asymmetric beta-skew-normal (BSN) model and by incorporating an additional parameter. The new family of distributions extend the usual normal model and others distributions are also particular cases from this model.

Definition 1.
The random variable Z is said to have a beta-skew alpha-power distribution, which we will denote as Z ∼ BSAP(β, α), if Z has the following pdf for α > 0 and β ∈ R. The functions f BSN (z; β) and F BSN (z; β) are the pdf and cdf of the BSN distribution given by and respectively. Here, φ(·) and Φ(·) are de pdf and cdf of the standard normal distribution, respectively. Figure 1 depicts the shape of the beta-skew alpha-power (BSAP) distribution for some selected values of α and β parameters. It can see from the graph that BSAP distribution has unimodal, bimodal and multimodal (three modes) behavior. Notice that, when β increases, the density function takes a bimodal shape. The graphs in the figure also show a bimodal shape for great values of α.

Proposition 1.
The density function (9) has at most three modes.
Proof. Given that α is an asymmetry parameter, without loss of generality we take α = 1 (the case of the BSN distribution) and by differentiating Equation (10) we obtain thus, f BSN (z; β) has at most seven zeros. It can be seen that for 0 < |β| < +∞, the value z = 0 is a zero of the f BSN (z; β) function and it can be shown that f BSN (0; β) = − √ 2/π/(2 + 15β 2 ) < 0, therefore, at z = 0 occurs a maximum when 0 < |β| < +∞. By analyzing the polynomial of the BSN model, we making g(z) = (1 − βz 3 ) 2 + 1 = β 2 z 6 − 2βz 3 + 2, hence g(−z) = β 2 z 6 + 2βz 3 + 2. For β > 0 the number of positive roots of the polynomial g(z) is two, while g(−z) does not change of sign, hence, the polynomial does not have negative roots. It is concluded that the polynomial has at most four complex roots. For β < 0, the polynomial would have two or no negative roots and four complex roots. Using computational methods, it can be shown that for values of β ∈ [0, 1000] the resulting polynomial from the derivative β 2 z 6 − 6β 2 z 4 − 2βz 3 + 6βz + 2 has at least two non-real complex roots, therefore, it would have at most four real roots, and therefore it is concluded that at most they would have three maximums in the model.
The cdf of Z, which we denote by F BSAP (t, β, α), is given by Figure 2 shows the shape of the survival function for some selected values of α and β. In these graphs, it can be seen that the curve becomes increasingly horizontal in the interval (0, 1) as β increases, and the probability of survival is greater for larger α values when β is constant. It is also important to note that, regardless of the values of β and α, the survival approaches to zero when z tends to infinite. Proposition 3. Let Z ∼ BSAP(β, α), then the k-th moment of Z is given by Proof. The proof is direct from definition of expected value.
The expected value, the variance and the indices of skewness and kurtosis of the BSAP model, which are denoted by µ, σ 2 , γ 1 and γ 2 respectively, can be obtained from (16) by using the following expressions Here
Notice that, the length of the admissible intervals for the skewness and the kurtosis parameters of the BSAP distribution are larger than the corresponding intervals of the ASN, the SN, the PN and the skew-  [1,7,11,12] for more details.
Some additional properties for the special case when α = 1 are presented to follow.
we obtain

Location and Scale Extension for BSAP Model
We can also consider a generalization of the BSAP(β, α) distribution by adding location and scale parameters. Next definition gives the generalization of the BSAP model. α). The beta-skew alpha-power density of location and scale is defined as the distribution of X = ξ + ηZ, for ξ ∈ R and η > 0. The corresponding density function is given by where z = (x − ξ)/η and θ = (ξ, η, β, α) . We denote this as X ∼ BSAP(θ).
The distribution function associated to the density (20) is given by and the k-th moment of the random variable X is where Z ∼ BSAP(β, α).

Maximum Likelihood Estimation for BSAP Model
We consider a random sample X = (X 1 , . . . , X n ) of size n, from the BSAP(θ) distribution, where θ = (ξ, η, β, α) . The log-likelihood function is given by which is a continuos function in each parameter. Thus, by differentiating the log-likelihood function we obtain the following likelihood equations where The solutions of likelihood Equations (21)-(24) provide the maximum likelihood estimators (MLEs) of ξ, η, α and β, which can be obtained by numerical method such as Newton-Rapshon type procedure. Under certain regularity conditions, the elements of the Fisher information may be calculated as The Cramér-Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any unbiased estimator. Thus, we can find a lower bound for the standard errors (SE) of the MLEs as the square root of the diagonal elements of the observed Fisher information matrix. The observed information matrix can be obtained by taking the second partial derivatives of the log-likelihoiod function and multiplying by −1, that is, where θ 1 = ξ, θ 2 = η, θ 3 = β, and θ 4 = α. The elements of the expected and observed Fisher information |I(θ)| and |J(θ)|, respectively, are given in the Appendix A. It can be shown that, for β = 0 and α = 1 (the normal distribution case), we have that |I(θ)| = 0, therefore, I(θ) is non-singular. Hence, covariance matrix of the parameter vector is the inverse of the Fisher information matrix, that is, Var(θ) = I −1 (θ). Thus, the MLEθ converges to a normal distribution, that iŝ θ d → N 4 (θ, I −1 (θ)).

Simulation Study
In order to study the performance of the MLEs of the parameters in BSAP model, we conducted a Monte Carlo simulation study with samples sizes n = 40, 80, 120, 160 and 320. The true values of the parameters were taken as ξ = 0, η = 1, α = 0.5, 2.0 and 5.0; and β =1.0, 1.75 and 3.0. For each combination of parameters and sample sizes, we generated 5000 samples from the BSAP model. To evaluate estimators performance were considered the absolute value of the bias (B), and the squared root of the mean squared error (RMSE). They are given by respectively, whereθ i is the estimate of θ i for the j-th sample, for θ i ∈ θ = (ξ, η, β, α) . MLEs were computed by using optim function in R Development Core Team [13].
From Table 1 we can see that, as the sample sizes increase, the bias (in absolute value) and the squared root of the mean squared error decrease, indicating a good behavior of the MLEs of the parameters in BSAP model. Then, it follows that for large sample sizes, MLEs are asymptotically consistent.

Real Data Applications
In this section, we illustrate the proposed model by considering two real data sets. In the first application we consider the data on the otis IQ scores of 52 non-white males hired by a large insurance company in 1971. In the second application we use the geyser data set, available in R Development Core Team [13].

Application 1: The Otis IQ Scores Data
These data set have been analyzed previously by Gupta and Gupta [14] and Sharafi and Behboodian [15]. See, Roberts [16] for more details. Table 2 shows the statistic summary for the otis scores data. The results shows that the data present positives skewness and kurtosis lower than the normal model, likewise, the histogram in Figure 3 shows that the data have more than one mode. We fit the BSN and BSAP models to analyze this data set. To compare the proposed model, we also fit the multimodal alpha-beta skew-normal (ABSN) model by Shafiei et al. [10], and the asymmetric bimodal model (ETN) of Arnold et al. [2]. The fit of these models is carried out by using the maximum likelihood method and optim function of [13]. Table 3 shows the parameter estimates, together with their corresponding standard errors (SE). To compare fitted models, we use the AIC Akaike [17], corrected CAIC, BIC by Hastie and Tibshirani [18] and the HQIC or information criterion of Hannan and Quinn [19], namely and where p is the number of parameters in the considered model. According to any of these criteria, the BSAP and BSN seem to provide better fit to the otis IQ scores data than the ABSN and ETN models. We can use the likelihood ratio (LR) test statistic to confirm the use the BSAP model instead of the BSN model, so we consider the following hypotheses, (BSN(ξ, η, β)) vs H 1 : α = 1 (BSAP(ξ, η, β, α)), with LR test statistic, where L BSN (·) and L BSPN (·) denote the likelihood functions of the BSN and BSAP models, respectively. We obtain the value −2 log(Λ) = −2( BSN (θ; X) − BSAP (θ; X)) = 4.0166, and comparing this quantity with χ 2 1 = 3.841, the null hypotheses is rejected, that is, the BSAP model is more flexible than the asymmetric BSN model, taking into account the test results of Hartigan and Hartigann [20] and Hartigan [21].  Figure 3a,b show the behavior of the fitted models and the empirical cdf for the ETN, BSN and BSAP models. It can be seen from the figure that the BSAP model has the best fit against the ABSN, BSN and ETN models, while the BSN model has a better fit than the model ETN. Also, the graphs in Figure 3c,d show the QQplots for the BSAP and BSN models.

Application 2: Old Faithful Geyser Data
For the second application, we consider a data set consisting of 272 observations about the wait times between the eruptions (in minutes) of the old faithful geyser in Yellowstone National Park, Wyoming, U.S. Data set are available in the libraries stats and MASS of R Development Core Team [13]. More information of these data can be seen in Azzalini and Bowman [22] who takes a look at this set of data. Table 4 shows the summary statistic for data set. The results shows that the data set present negative asymmetry and kurtosis below the normal model.
In addition to the BSAP and BSN models, we also fit the ETN and the alpha-skew-normal (ASN) models, see Elal-Olivero [7]. The results of the fit of these models can be seen in Table 5. The standard errors of the estimators were obtained by using the observed information matrix, and again, to compare the fited models, we use the AIC, CAIC, BIC and CAIC criteria. According to any of these criteria, BSAP model seems to provide better fit to the old faithful gayser data than BSN, ASN and ETN models. We tested the hypotheses with LR test statistic, where L BSN (·) and L BSPN (·) denote the likelihood functions of the BSN and BSAP models, respectively. We obtain the value −2 log(Λ) = −2( BSN (θ; X) − BSAP (θ; X)) = 8.884, and comparing this quantity with χ 2 1 = 3.841, the null hypotheses is rejected, that is, the BSAP model is more flexible than the asymmetric BSN model. Figure 4 shows the behavior of the fitted models and the empirical cdf for the ETN, BSN and BSAP models. It can be seen that the BSAP model has the best fit against the BSN and ETN models, while the BSN model has a better fit than the ETN model. The graphs in Figure 4 also show the QQplot for the BSAP model.

Conclusions
In this paper, a new class of unimodal, as well as bimodal and trimodal, skew distribution was proposed. The main statistical properties of the model and the problem of the parameters estimation are studied in details by using maximum likelihood method. The model extends the usual normal distribution to trimodal asymmetric case and the BSN model is also a special case. Furthermore, we have shown that such distribution is more flexible than certain rival models and it fits better to some real data sets. where The elements of the expected Fisher information matrix can be obtained by using i θ p θ q = −E ∂ 2 (θ; X) ∂θ p ∂θ q , for p, q = 1, 2, 3, 4. and letting where Z ∼ BSAP(β, α); we have i ξξ = (α − 1)