On Properties of the Bimodal Skew-Normal Distribution and an Application

The main object of this paper is to develop an alternative construction for the bimodal skew-normal distribution. The construction is based upon a study of the mixture of skew-normal distributions. We study some basic properties of this family, its stochastic representations and expressions for its moments. Parameters are estimated using the maximum likelihood estimation method. A simulation study is carried out to observe the performance of the maximum likelihood estimators. Finally, we compare the efficiency of the new distribution with other distributions in the literature using a real data set. The study shows that the proposed approach presents satisfactory results.

Henze [2] developed the stochastic representation for the skew-normal density and computed its odd moments. The stochastic representation and some more general representations for skew models are also discussed in Azzalini [3]. Arnold et al. [4] make use of the above results to develop truncations of the normal model. Pewsey [5] studied inference problems faced by the skew-normal model with some general results, especially consequences of the singularity of the Fisher information matrix (FIM) in the vicinity of symmetry. Gupta and Chen [6] developed a goodness of fit test. Reliability studies for the skew-normal model were developed by Gupta and Brown [7]. Univariate extensions to the skew-normal model are studied in Arellano-Valle et al. [8], Azzalini [9], Gómez et al. ( [10,11]), Venegas et al. [12], etc.
When it is necessary to model data with more than one mode, mixtures of distributions are always used. The importance of such studies rests on the fact that these models present computational difficulties due to identifiability problems. One major motivation of this paper is to develop models than can be seen as alternative parametric models to replace the use of mixtures of distributions, as these present estimation problems from either classical or Bayesian points of view ( [13,14]).
Bimodal distributions generated from skew distributions can be found in Azzalini and Capitanio [15], Ma and Genton [16], Arellano-Valle et al. [17], Kim [18], Lin et al. ([19,20]), Elal-Olivero et al. [21], Arnold et al. [22], Arellano-Valle et al. [23], Elal-Olivero [24], Gómez et al. [25], Arnold et al. [26], Braga et al. [27], Venegas et al. [28], Shah et al. [29], Gómez-Déniz et al. [30], Esmaeili et al. [31], Imani and Ghoreishi [32], Maleki et al. [33], etc. The main object of this paper is to study the properties of the bimodal skew-normal model introduced by Elal-Olivero et al. [21]. In particular, we derive results related to stochastic representation of the distribution and density function; this makes it simple to derive distributional moments and inferences by maximum likelihood (ML) estimation, among other quantities. The paper is organized as follows. Section 2 develops a bimodal normal distribution, its basic properties, representation, moments and moment generating function. Section 3 develops a skew-normal bimodal distribution, its basic properties, stochastic representation, moments and moment generating function. In Section 4, we perform a small scale simulation study of the ML estimators for parameters. A real data application is discussed in Section 5, which illustrates the usefulness of the proposed model. Conclusions and future work are presented in Section 6.

Bimodal Normal Distribution
Definition 1. If the random variable X has density function where φ is the N (0, 1) density, we say that X follows a bimodal normal distribution(see Elal-Olivero [24]) which is denoted by X ∼ BN.
, and it follows that it reaches its minimum value at x = 0 and maximum value at x = − √ 2 and x = √ 2. Notice that the maximum value is the same; this fact will play en important role in defining a more flexible model by adding an extra parameter which will control the height at the modes.
For the sake of completeness, some important results derived in Elal-Olivero [24] are presented below.
1. If X ∼ BN and F X (t) and M X (t) are the corresponding distribution and the moment generating functions then 2. Let X and U be independent random variables with X ∼ χ 2 (3) and U such that

Bimodal Normal Distribution with Shape Parameter
Definition 2. If random variable X has density given by where φ is the density of the N (0, 1) distribution, we say that X is distributed according to the bimodal normal distribution with parameter α which we denote by X ∼ BN(α).

Remark 2.
Some basic properties are shown in the following. Under the assumption that X ∼ BN(α) and that f (x|α) is the density function of X, then Proposition 1. If W ∼ BN(α) then the cumulative distribution and the moment generating function (MGF), denoted by F W (t) and M W (t) respectively, are given by: Proof. Follows directly from the corresponding definitions.
The next result presents a stochastic representation for the BN(α) distribution.

Proposition 2.
Let X and Y be independent random variables such that X ∼ BN and Y ∼ N(0, 1).
Proof. Using moments definition, which agrees with the MGF of BS(α) derived above.
Hence, the kurtosis coefficient is given by Recall that the kurtosis coefficient is given by given that the even moments are zero and µ 2 = 1+3α 1+α and µ 4 = 3+15α 1+α , which proves the result.

Remark 3.
Considering that the BN(α) model is unimodal for α ∈ [0, 0.5], Table 1 shows the variation for the kurtosis coefficient as parameter α ranges the interval.

Bimodal Asymmetric Distribution
We start by dealing first with the extension of the ordinary normal bimodal distribution.

One-Parameter Bimodal Skew-Normal Distribution
Definition 3. If the random variable X is distributed according to the density function then we say that it is distributed according to the bimodal skew normal distribution with parameter λ which we denote by X ∼ BSN(λ).
Proof. Let Y = X 2 , then, for y ≥ 0, it follows that Differentiating this last expression we conclude the proof by showing that . We note that the heights at the modes for a bimodal symmetric distribution are the same. However, for an asymmetric distribution the heights at the modes are not the same as shown in the next proposition.
Moreover, let x 0 and x 1 (x 0 < x 1 ) be the points at which the function f (x|λ) reaches its the maximum value. Then, . On the other hand, for the symmetric case, it is known that The case λ ≤ 0 is proved similarly.
Proposition 6. Let X ∼ BSN(λ) , Y ∼ SN(λ) and F X , F Y be the distribution functions of the random variables X, Y, respectively, and f Y the density function of Y. Therefore, the distribution function of the random variable X ∼ BSN(λ). Integrating by parts, with u = xΦ(λx) and dv = xφ(x), it follows that du = (λxφ(λx) + Φ(λx))dx and v = −φ(x), so that

Two-Parameter Bimodal Skew-Normal Distribution
Definition 4. If the density function of a random variable X is such that then we say that X is distributed as the bimodal skew-normal distribution(see Elal-Oliviero et al. [21]) with parameters λ and α, which we denote by X ∼ BSN(λ, α). Figure 2 depicts examples of the BSN distribution given in Equation (9) for different values of parameters λ and α.  φ(x)Φ(λx), corresponding to the BSN(λ, α) model, as α ranges in the interval [0, ∞] the density changes from unimodal to bimodal, and vice versa, with great flexibility.
As we mentioned in the introduction, the BSN model was introduced by Elal-Olivero et al. [21] and used for the Bayesian inference. Now we observe that we have constructed it based on a mixture of two distributions; below we study some of its properties, carry out a simulation study to see the behavior of the ML estimators and present an application to a real data set. Proposition 8. Let X ∼ BSN(λ, α), so that the following properties hold: Proof. For w ≥ 0, it follows that which upon differentiation leads to: We note that the density for the BSN(λ, α) model can be seen as a mixture between the SN(λ) and BSN(λ) models.

Proposition 9.
Let W ∼ BSN(λ, α), Y ∼ SN(λ) and F W and F Y be the distribution function for the random variables W and Y respectively and f Y (·) the density function Y. Then, Proof. Let F X be the distribution function of the random variable X, where X ∼ BSN(λ). Then, the result follows by noticing that Proposition 10. Let W ∼ BSN(λ, α) and Y ∼ SN(λ), and let, M W and M Y be the moment generating functions for the random variables W and Y, respectively; then, Proof. The result follows by noticing that Remark 6. The properties for the BSN(λ, α) model presented next follow from general properties of the f (z) = 2 f 0 (z)G(w(z)) model presented in Azzalini [3] where f 0 is a symmetric (around zero) density function, G is a unidimensional (symmetric) distribution function such that G exists and w(·) is an even function.

Remark 7.
The moments of the BSN(λ, α) model can be computed easily, separating even from odd moments.

2.
For the odd moments, considering that Y ∼ SN(λ), it follows that: where the odd moments E[Y 2k+1 ] can be computed using derivations in Henze [2], leading to: If we denote by β 1 and β 2 the asymmetry and kurtosis coefficients, respectively, then: where µ k = E(X k ), with k = 1, 2, 3, 4 are as given by Representing the asymmetry for specific values of α and λ by β 1 = A(α, λ), then the following relation holds: A(α, λ) = −A(α, −λ). Tables 2 and 3 show asymmetry and kurtosis values for different values of α and λ.   The parameter λ produces asymmetry in the symmetric BSN(α) model and, in particular when α > 1 2 , the asymmetry is reflected in the change of height of the modes of the symmetric bimodal model. The following proposition shows the identifiability of the model.
To prove that f is an injective function with respect to parameter λ, we use an analogous procedure, which concludes the proof.
Below we show some properties involving conditional distributions and their relations with the model introduced in this paper.

Simulation Study
In this section we report results of a small scale Monte Carlo simulation study conducted to evaluate the performance of parameter estimation by ML. Results are based on 1000 samples generated from the BSN model for several sample sizes. Use was made of the stochastic representation presented in Remark 6(1) implemented in R software [34]. For each sample generated, the likelihood function was maximized using the optim function of the R software. Given that the Fisher information matrix is singular when in the vicinity of symmetry (λ = 0), the use of algorithms such as Fisher-Scoring is not advisable. We therefore used the Nelder and Mead [35] algorithm, a direct search method that works satisfactorily for non-differentiable functions.

Nelder-Mead Method
For each sample size and parameter configuration, the ML estimators were computed leading to an empirical mean and empirical standard deviation (SD). Results are depicted in Tables 4 and 5. The main conclusion is that estimation was satisfactorily stable for moderate and large sample sizes. The simulation study also indicated certain difficulties in the estimation approach for situations near singularity, see Table 6 below. Table 6 below reports the results. With n ≥ 100, convergence problems were less frequent (≈ 5%), and for n ≥ 200, very few cases presented problems. Table 7 shows simulation results using the Fisher-Scoring algorithm. It was obtained that for α = 0 and λ > 5 (and λ < −5), for n = 300, non-convergence was down to ≈ 25%. For n = 600 some improvement was noted in the convergence rate although the standard error remained large as λ increased. Parameter recovery was satisfactory even for moderate sample sizes.

Application
We next present an application of the BN(µ, σ, α) model to a real data situation in which comparisons are made with the mixture of normals model and a flexible asymmetric model. We fitted these models to the variable ultrasound weight, i.e., fetus weight before birth, in 500 observations of the variable z = b.weight, which is the ultrasound weight (fetal weight in grams). These data are available at http://www.mat.uda.cl/hsalinas/cursos/2011/R/weight.rar. Given the difference in weight of males and females in the gestation stage, it is clear that these data present bimodality.
The descriptive statistics are given in Table 8, where √ b 1 and b 2 denote the asymmetry and kurtosis coefficients respectively.
The fit of this data set with the BSN model is compared with the fit given by the mixture of two normals (MN) model, and the flexible skew-normal (FSN) model introduced by Ma and Genton [16], for which the respective densities are given by: where φ(·) and Φ(·) denote the density and distribution functions of the standard normal distribution, µ, µ 1 , µ 2 , α, β ∈ R, σ, σ 1 , σ 2 ∈ R + and 0 ≤ p ≤ 1. ML estimation, Akaike's information criterion (AIC) (see Akaike [36]) and Bayesian information criterion (BIC) (see Schwarz [37]) are reported in Table 9. Notice that the BSN model presents the best fit (smallest AIC and BIC values). This is also corroborated by Figure 3, which shows the curves corresponding to the fitted models (parameters estimated using the ML approach), overlaid by the data set histogram.

Concluding Remarks
In this paper we have studied further properties of the so called BSN model. Stochastic representations were studied for the model itself and for some sub-models. Derivation of the FIM and verification of its non-singularity is in preparation and will be the subject of a future paper. Satisfactory results are obtained for the BSN model in a real data application. Some additional features of the BSN model are: • The BSN model presents quite good flexibility in the modes, as shown in Figures 1 and 2. • The BSN models present no identifiability problems, as shown by Proposition 11.
• The moments, moment generating function, asymmetry and kurtosis coefficients have closed form expressions. • The simulation study shows that, although there is a little irregularity in consistency for values α = λ = 0, the ML estimators are consistent. This point of irregularity can be explained by the singularity that exists in this point of the information matrix; we will address this situation in a future work.
• We use a model which is a mixture of non-identifiable normal distributions for comparison with the BSN model. • In the application, two statistical criteria for comparing models were used. The two criteria indicate that the BSN model provides the best fit for these data.