Abstract
This article introduces a bimodal model based on the epsilon-skew-normal distribution. This extension generates bimodal distributions similar to those produced by the mixture of normal distributions. We study the basic properties of this new family. We apply maximum likelihood estimators, calculate the information matrix and present a simulation study to assess parameter recovery. Finally, we illustrate the results to three real data sets, suggesting this new distribution as a plausible alternative for modelling bimodal data.
MSC:
62E15; 62E20
1. Introduction
In the last two decades of the twentieth century, inferential processes assumed the normality of the data. This assumption is not realistic in many cases, and the inferential processes were, therefore, inappropriate. In such situations, many authors decided to transform the variables in order to achieve symmetry or normality of the data, but these transformations led to unsatisfactory results because their interpretation became very complicated. Azzalini [1] introduced the skew-normal (SN) distribution, which allows asymmetric data to be modelled without the need for transformation. The probability density function (pdf) of the SN distribution is given by
where , and represent the pdf and cumulative distribution function (cdf) of the standard normal distribution, respectively. This is usually denoted by SN(), where is a shape parameter. Many works have since been published on the SN distribution. To name a few, Azzalini [2], Henze [3], Chiogna [4], Pewsey [5], Arellano-Valle et al. [6], DiCiccio and Monti [7], Salinas et al. [8], Rosco et al. [9], Shafiei and Doostparast [10], Adcock and Azzalini [11], etc.
Mudholkar and Hutson [12] studied the epsilon-skew-normal (ESN) distribution with an asymmetry parameter , such that the standard normal distribution is recovered when . Specifically, X has an ESN distribution if its pdf can be written as
where sgn is the sign function and . We denote this by . The properties of this distribution were studied extensively by Mudholkar and Hutson [12]. Arellano-Valle et al. [13] introduced a general family of epsilon-skew-symmetric (ESS) distributions, of which the ESN distribution is a particular case. Some works using distributions of this family are as follows: Hansen [14] applied the epsilon-skew-t distribution to economic data; Gómez et al. [15] applied it to mining data; recently, Celis et al. [16] introduced an epsilon-positive family of distributions based on the ESS family and applied it to data with or without censoring and Bevilacqua et al. [17] used the ESS family for modelling atypical data in special statistics.
The SN and ESN distributions are unimodal, meaning that they are not appropriate in fields such as economics, health, engineering and many others where the data are often bimodal. One of the classic distributions used for modelling bimodal data is the mixture of the normal (MN) distribution, but it was criticised for identifiability problems. See, for example, McLachlan and Peel [18] and Marin et al. [19]. Despite the many innovations introduced in this area, the problem still persists in many cases, which is a disadvantage when working with models of this type. Hence researchers continue to develop new symmetric and asymmetric bimodal distributions.
Bimodal distributions have been obtained from this skew-symmetric model. For example, Azzalini and Capitanio [20], Ma and Genton [21], Arellano-Valle et al. [13], Kim [22], Elal-Olivero et al. [23], Arnold [24], Gómez et al. [25], Hassan and El-Bassiouni [26], da Silva et al. [27], Cordeiro et al. [28], da Braga et al. [29], Altun et al. [30] and Alizadeh et al. [31], among others. For further information on the results of SN distribution and related families, see Azzalini’s book [32].
Models of this type were studied by Kim [22], who introduced a bimodal extension of the SN model, named “two-pieces skew-normal model (TN)”, denoted by , whose pdf can be written as
where and . For , Kim [22] discusses that (2) defines a bimodal and symmetric around zero pdf.
An asymmetric extension of Kim’s model was presented by Arnold [24], who developed an asymmetric bimodal model named “the extended two-pieces skew-normal model (ETN)”, with pdf given by
where and is a normalizing constant. The distribution is denoted by ETN().
Another model widely applied in these situations is the MN distribution, which is given by:
where denotes the pdf of the normal distribution with parameters mean and standard deviation and .
This article is organised as follows: in Section 2, we give the pdf of the new distribution, its basic properties and moments; in Section 3, we make an inference by the maximum likelihood (ML) method, we calculate the information matrix and we carry out a simulation study to assess the properties of the ML estimators in finite samples; in Section 4, we show three fits to real data sets and compare them with other distributions, and in Section 5, we discuss some conclusions.
2. The New Density
In this section, we introduce the bimodal ESN (BESN) distribution, an extension of the ESN distribution and study its basic properties.
Definition 1.
A random variable X has a BESN distribution if its pdf is given by
where and . We denote this by .
Remark 1.
Note that the BESN model contains the standard normal model when , while for , we obtain the ESN model. When , we obtain a symmetric bimodal model considered in Elal-Olivero et al. [23]. Thus this model is a new alternative for modelling both symmetric and asymmetric bimodal data.
Proposition 1.
If , and , then its cdf is given by:
Remark 2.
As expected, if , , , then
As can be seen from Figure 1, the shape of the pdf in relation to the unimodality and bimodality depends on parameters and .

Figure 1.
Distributions (a) BESN(0.8,0) (solid line), BESN(0.2,0) (dashed line); (b) BESN(0,10) (solid line), BESN(0,2) (dashed line); (c) BESN(0.05,10) (solid line), BESN(0.05,2) (dashed line); and (d) BESN(0.2,4) (solid line), BESN(−0.2,4) (dashed line).
2.1. Properties
Proposition 2.
Let . Then
- (i)
- geometrically, is a reflection of the ordinates related to the change in the sign of parameter ε. This is, ;
- (ii)
- if then , which is symmetric bimodal when and symmetric unimodal if .
Proof.
Using the density given in (4), the results are obtained immediately. □
Proposition 3.
Let . Then, the pdf for X is
- 1.
- bimodal when and its modes are located in
- 2.
- unimodal when and
- (a)
- , and its mode is located in
- (b)
- , and its mode is located in .
- 3.
- unimodal when and
- (a)
- , and its mode is located in
- (b)
- , and its mode is located in .
Proof.
We calculate the first derivative of the pdf given in (4) and equal it to zero to find its critical values. That is,
Consequently, or . Without loss of generality, let us assume in the latter equation that , hence we get the equation
then on solving it, we obtain that whenever . Now, assuming that , we obtain the equation
then, as before, it follows that when , then , and so considering that the proof of part 1 is concluded.
Considering the range of values of and in 2(a), we can deduce that Equation (5) has no solution and that Equation (6) has a solution equal to . Then, using the criterion of the second derivative for h, we have
Therefore, at is attained a maximum of h. In consequence, it corresponds to their mode, and thus, proof 2(a) follows. On the other hand, in case 2(b), their hypotheses imply that Equations (5) and (6) have no solutions. Hence, the unique critical point of h is zero. Consequently, its mode is located at .
Proceeding analogously to the proof of 2, the proof of part 3 follows, and thus the proof is completed. □
2.2. Moments
This subsection is devoted to presenting the r-th moment of the model and its moment-generating function (mgf).
Proposition 4.
If . Then for the r-th distributional moment is given by
where , and .
Proof.
Let , then
Hence, the result is obtained using the moments of the ESN model (see [12]). □
Corollary 1.
Let . Then
The skewness () and kurtosis () coefficients can be computed as
Figure 2 shows these coefficients for the BESN model in terms of and .
Figure 2.
Plots for (a) skewness and (b) kurtosis coefficients for the BESN model.
Finally, in the following proposition, we present the mgf for the BESN distribution.
Proposition 5.
The mgf for the BESN distribution is given by
Proof.
Using the definition and with a simple algebraic development, it is possible to obtain the result. □
3. Inference
In this section, we apply the ML method to estimate the parameters. We calculate Fisher’s information matrix, and we carry out a simulation study.
3.1. ML Estimation
For a random sample of size n from the BESN distribution, the log-likelihood function for is given by
The score function is given by , where
The ML estimators of and are the solution of the system of equations , where defines a vector of zeros with dimension p. The elements of the Hessian matrix, defined as the derivate of the score function with respect to each parameter, can be expressed as
3.2. Fisher’s Information Matrix
The elements of the Fisher’s information matrix are given by
where and For more details of these integral formulas, we refer the reader to Gradshteyn and Ryzhik [33].
3.3. Location-Scale Case
For , the extension to the location-scale case of the model follows from the transformation , where and . The pdf for X is given by
where . We denote this by . Therefore, for a random sample of the random variable , the log-likelihood function for is given by
where and the score function can be expressed as
The maximum likelihood estimator for , say , satisfies the asymptotic distribution
where denotes the Fisher’s information matrix. See Appendix A for details about this matrix. In other words, is asymptotically normally distributed and consistent.
3.4. Simulation Study
In this subsection, we present a simulation study in order to assess the performance of the ML estimator in finite samples. To do this, Table 1, Table 2 and Table 3 consider three values for : −0.90, −0.25 and 0.75; three values for : 0.5, 1.5 and 3; two values for : −3 and 0; two values for : 1 and 5. Previous studies (not presented here) suggest that with smaller sample sizes (for example, and ), the estimators do not give good results in terms of bias; we, therefore, considered the following sample sizes: 100, 200 and 300. Values for the BESN distribution can be drawn using the Metropolis–Hastings algorithm. We use a burn-in period of 1000 and a thin of 20 in order to avoid a possible correlation between successive values. For each combination of , , , and n (totaling 108 combinations), we draw 1000 samples of size n from the BESN model, and for each, we compute the corresponding ML estimators. We report the bias of the 1000 samples, the mean of the estimated standard errors (SE) and the root of the estimated mean squared error (RMSE). In general terms, we note that the bias and the RMSE of the estimators decrease to zero when the sample size is increased, suggesting that the estimators are consistent even in finite samples. In addition, the SE and RMSE terms are closer when the sample size is increased, suggesting that the variance of the estimators is well estimated. However, as mentioned above, we suggest a sample size of at least approximately 100 data to ensure good properties of the maximum likelihood estimators.
Table 1.
Recovery parameters for the ML estimators in the BESN model: case .
Table 2.
Recovery parameters for the ML estimators in the BESN: case .
Table 3.
Recovery parameters for the ML estimators in the BESN: case .
4. Applications
In this section, we fit the BESN distribution to three real data sets that are widely used in the literature, namely the roller, birthweight and nickel data sets. The first application is to a unimodal data set and is compared with the fit of the normal (N) distribution; the second application is to a symmetric bimodal data set and is compared with the fits of the N and TN distributions; the third application is to an asymmetric bimodal data set and is compared with the fits of the SN, ETN and MN distributions. To compare the models, we use the Akaike information criterion AIC (see Akaike [34]) and the Bayesian information criterion BIC (see Schwarz [35]). Traditionally the preferred model is the one with the smallest AIC and/or BIC.
4.1. First Application: Roller Data
In this first application, we use the data set related to 1150 heights measured at 1-micron intervals along the drum of a roller (i.e., parallel to the axis of the roller). This was part of an extensive study of the surface roughness of the roller. It is available for downloading at http://lib.stat.cmu.edu/jasadata/laslett (accessed on 5 November 2022). Summary statistics for the data set are presented in Table 4.
Table 4.
Summary statistics for the variable roller.
Given the values of sample asymmetry, , and sample kurtosis, , there is strong evidence that an asymmetric model could provide a better fit to the data under study. Therefore, the N and BESN distributions are fitted to the data set.
The ML estimates for each model (N and BEST) and standard errors (SE) in parentheses are: and with AIC and BIC for the N distribution and and with AIC and BIC for the BESN distribution.
According to AIC and BIC, the BESN distribution provides a better fit for the roller data set than the N distribution. In other words, the BESN distribution achieves a satisfactory fit for skewness and kurtosis, which is not adequately fitted by the N distribution. Therefore, the BESN distribution presents the best fit for the roller data set. A qq-plot for the variable roller, using normal and BESN distributions, is shown in Figure 3a,b.
Figure 3.
QQ-plots fitted using (a) the N model, (b) the BESN model, (c) the cdf of the BESN model (dotted line) and empirical cdf (solid line).
Figure 3c shows the empirical cdf for the variable roller (solid line), while the dotted line corresponds to the cdf for the BESN model. The results suggest a better fit for the BESN model.
In addition, we also compare the N and BESN distributions with a hypothesis test. Specifically, we propose the hypothesis
which can be tested using the statistic
After numerical evaluations, we obtain , which is greater than the critical 5% chi-squared value with two degrees of freedom, namely .
4.2. Second Application: Birthweight Data
In the second application, we study the fit of the BESN model to 500 units observed for the variable Z=b.weight, which is the ultrasound weight (birthweight in grams). These data are available as supplementary material. The summary statistics for the data set are presented in Table 5.
Table 5.
Summary statistics for the variable birthweight.
Given the symmetry of these data, we propose to fit a BESN model taking and then compare this with the fit of the N and TN models.
We used AIC and BIC to compare the fits of the N, TN and BESN models. According to these criteria (see Table 6 and Figure 4a), the BESN model is seen to present a better fit than the N and TN models.
Table 6.
Parameter estimates and SE for the N, TN and BESN models.
Figure 4.
(a) Histogram for the variable birthweight. Estimated pdf: N (dotted red line), TN (dashed blue line) and BESN (solid black line). (b) QQ-plot for the BESN distribution. (c) Empirical cdf for the variable birthweight (solid black line) and cdf of the BESN distribution (dotted red line).
Figure 4b,c show the qq-plot for the BESN model and the empirical cdf. The plots suggest a better fit for the BESN model than its competitors.
4.3. Third Application: Nickel Data
In this third application, we use a data set related to the logarithm of the nickel content in soil samples analysed at the Mines Department of Universidad de Atacama, Chile, which are available as supplementary material.
In this section, we compare the fits of the SN, BESN, ETN and MN models to the above data set.
The pdf for the MN model can be written as
with parameters , and , we denote as MN.
The comparisons are made using the AIC and BIC for the variable Y, the logarithm of the nickel concentration. In all cases, the parameters are estimated by ML using bbmle in the R software package [36]. The SE of the ML estimates is calculated using the observed information matrix corresponding to each model.
Table 7 gives the estimated parameters and AIC and BIC for the SN, ETN, BESN and MN models. The respective SE are in parentheses. The graph in Figure 5 shows that the BESN model presents quite a good fit.
Table 7.
Estimated parameters and AIC and BIC for the SN, ETN, BESN and MN distributions. The respective SE are in parentheses.
Figure 5.
Fitted densities: (a) SN (dotted red line), MN (dashed blue line) and BESN (solid black line), (b) Empirical cdf (solid black line) and BESN distribution (dashed red line).
In all cases, the models are augmented by the inclusion of location () and scale () parameters. Since the models are not nested, the AIC and BIC have been used to compare the distributions. According to these criteria, the BESN model provides the best fit to the data of the example. Hence, the BESN model seems to be a useful alternative for modelling the data for the logarithm of the nickel concentration.
The conclusion of the study is that the BESN model appears to be more appropriate for the particular data sets analysed here. These points are illustrated in more detail in Figure 5, where the histograms and the fitted curves for the data sets are displayed.
5. Final Comments
In this paper, we propose the BESN distribution, which is shown to have flexible modes. We study its properties and implement ML estimation. Some other characteristics of the BESN distribution are:
- The BESN distribution contains, as special cases, the N and ESN distributions.
- The BESN distribution has a closed expression for its cdf.
- The moments of the BESN distribution have a closed expression.
- The three applications show that the BESN distribution provides a better fit than the other models tested.
Given the promising results, further research for the model can be addressed by applying the model in other areas. For instance, given the simplicity of the mean and the cdf for the BESN distribution, a reparametrization of the model in terms of the mean or the quantile should be considered in order to propose a new mean or quantile regression model based on this distribution.
Author Contributions
Conceptualization, J.D. and H.W.G.; methodology, G.M.-F., D.I.G. and H.W.G.; software, D.I.G.; validation, D.I.G. and H.W.G.; formal analysis, G.M.-F., D.I.G., O.V. and H.W.G.; investigation, J.D. and O.V.; writing—original draft preparation, J.D., D.I.G. and H.W.G.; writing—review and editing, D.I.G., O.V. and H.W.G.; funding acquisition, O.V. and H.W.G. All authors have read and agreed to the published version of the manuscript.
Funding
This work was partially performed by Héctor W. Gómez during a visit to the Universidad Católica de Temuco, supported by MINEDUC-UA project, code ANT 1999.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data set of the first application is available at http://lib.stat.cmu.edu/jasadata/laslett (accessed on 5 November 2022), and those of the second and third applications are available as supplementary material.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
In this Appendix, we present the elements of the Hessian matrix and the Fisher’s Information matrix for the location-scale case of the BESN distribution.
Appendix A.1. Elements of the Hessian Matrix
Appendix A.2. Elements of the Fisher’s Information Matrix
After some intense calculation, we have that the elements of the Fisher’s information matrix, say , are given by
where , , and is the exponential integral defined by ; these integrals can be computed numerically by someone’s mathematical software.
References
- Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
- Henze, N. A probabilistic representation of the Skew-Normal distribution. Scand. J. Stat. 1986, 4, 271–275. [Google Scholar]
- Chiogna, M. Some results on the scalar skew-normal distribution. J. Ital. Statist. Soc. 1998, 1, 1–13. [Google Scholar] [CrossRef]
- Pewsey, A. Problems of inference for Azzalini’s skew-normal distribution. J. Appl. Stat. 2000, 27, 859–870. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. A New Class of Skew-Normal Distributions. Commun. Stat. Theory Methods 2004, 33, 1465–1480. [Google Scholar] [CrossRef]
- DiCiccio, T.J.; Monti, A.C. Inferential aspects of the skew exponential power distribution. J. Am. Stat. Assoc. 2004, 99, 439–450. [Google Scholar] [CrossRef]
- Salinas, H.S.; Arellano-Valle, B.R.; Gómez, H.W. The extended skew-exponential power distribution and its derivation. Commun. Stat. Theory Methods 2007, 36, 1673–1689. [Google Scholar] [CrossRef]
- Rosco, J.F.; Jones, M.C.; Pewsey, A. Skew t distributions via the sinh-arcsinh transformation. Test 2011, 20, 630–652. [Google Scholar] [CrossRef]
- Shafiei, S.; Doostparast, M. Balakrishnan skew-t distribution and associated statistical characteristics. Commun. Stat. Theory Methods 2014, 43, 4109–4122. [Google Scholar] [CrossRef]
- Adcock, C.; Azzalini, A. A selective overview of skew-elliptical and related distributions and of their applications. Symmetry 2020, 12, 118. [Google Scholar] [CrossRef]
- Mudholkar, G.S.; Hutson, A.D. The epsilon-skew-normal distribution for analyzing near-normal data. J. Statist. Plann. Inference 2000, 83, 291–309. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. Statistical inference for a general class of asymmetric distributions. J. Statist. Plann. Inference 2005, 128, 427–443. [Google Scholar] [CrossRef]
- Hansen, B.E. Autoregressive conditional density estimation. Int. Econ. Rev. 1994, 35, 705–730. [Google Scholar] [CrossRef]
- Gómez, H.W.; Torres, F.J.; Bolfarine, H. Large-Sample Inference for the Epsilon-Skew-t Distribution. Commun. Stat. Theory Methods 2007, 36, 73–81. [Google Scholar] [CrossRef]
- Celis, P.; de la Cruz, R.; Fuentes, C.; Gómez, H.W. Survival and Reliability Analysis with an Epsilon-Positive Family of Distributions with Applications. Symmetry 2021, 13, 908. [Google Scholar] [CrossRef]
- Bevilacqua, M.; Caamaño-Carrillo, C.; Arellano-Valle, R.B.; Gómez, C. A class of random fields with two-piece marginal distributions for modeling point-referenced data with spatial outliers. Test 2022, 31, 644–674. [Google Scholar] [CrossRef]
- McLachlan, G.; Peel, D. Mixture Models: Inference and Applications to Clustering; Marcel Dekker: New York, NY, USA, 2000. [Google Scholar]
- Marin, J.M.; Mengersen, K.; Robert, C. Bayesian modeling and inference on mixtures of distributions. Handb. Stat. 2005, 25, 459–503. [Google Scholar]
- Azzalini, A.; Capitanio, A. Distributions generate by perturbation of symmetry with emphasis on a multivariate skew-t distribution. J. R. Stat. Soc. Ser. B 2003, 65, 367–389. [Google Scholar] [CrossRef]
- Ma, Y.; Genton, M.G. Flexible class of skew-symmetric distributions. Scand. J Stat. 2004, 31, 459–468. [Google Scholar] [CrossRef]
- Kim, H.J. On a class of two-piece skew-normal distributions. Statistics 2005, 39, 537–553. [Google Scholar] [CrossRef]
- Elal-Olivero, D.; Gómez, H.W.; Quintana, F.A. Bayesian Modeling using a class of Bimodal skew-Elliptical distributions. J. Statist. Plan. Inference 2009, 139, 1484–1492. [Google Scholar] [CrossRef]
- Arnold, B.C.; Gómez, H.W.; Salinas, H.S. On multiple constraint skewed models. Statistics 2009, 43, 279–293. [Google Scholar] [CrossRef]
- Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics 2011, 22, 50–62. [Google Scholar] [CrossRef]
- Hassan, M.Y.; El-Bassiouni, M.Y. Bimodal Skew-Symmetric Normal Distribution. Commun. Stat. Theory Methods 2016, 45, 1527–1541. [Google Scholar] [CrossRef]
- Da Silva Braga, A.; Cordeiro, G.M.; Ortega, E.M.M. A new skew-bimodal distribution with applications. Commun. Stat. Theory Methods 2018, 47, 2950–2968. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; Alizadeh, M.; Ozel, G.; Hosseini, B.; Ortega, E.M.M.; Altun, E. The generalized odd log-logistic family of distributions: Properties, regression models and applications. J. Stat. Comput. Simul. 2017, 87, 908–932. [Google Scholar] [CrossRef]
- Da Braga, A.S.; Cordeiro, G.M.; Ortega, E.M.; da Cruz, J.N. The odd log-logistic normal distribution: Theory and applications in analysis of experiments. J. Stat. Theory Pract. 2016, 10, 311–335. [Google Scholar] [CrossRef]
- Altun, E.; Alizadeh, M.; Gamze, O.Z.E.L.; Tatlidil, H.; Maksayi, N. Forecasting Value-At-Risk with Two-Step Method: Garch-Exponentiated Odd Log-Logistic Normal Model. Rom. J. Econ. Forecast. 2017, 20, 97–115. [Google Scholar]
- Alizadeh, M.; Afshari, M.; Hosseini, B.; Ramires, T.G. Extended exp-G family of distributions: Properties, applications and simulation. Commun. Stat. Simul. Comput. 2020, 49, 1730–1745. [Google Scholar] [CrossRef]
- Azzalini, A. The Skew Normal and Related Families; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Gradshteyn, I.S.; Ryzhik, I.M. Tables of Integrals, Series, and Products, 7th ed.; Academic Press: San Diego, CA, USA, 2007. [Google Scholar]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Statist. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 25 December 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).