Abstract
In this work, we propose a new bimodal distribution with support in the real line. We obtain some properties of the model, such as moments, quantiles, and mode, among others. The computational implementation of the model is presented in the tpn package of the software R. We perform a simulation study in order to assess the properties of the maximum likelihood estimators in finite samples. Finally, we present an application to a bimodal data set, where our proposal is compared with other models in the literature.
1. Introduction
Describing a phenomenon by a probability distribution is very useful because of the properties associated with it: expectation, shape, range, etc. However, this description can be difficult when a phenomenon (in practice, an observed dataset) is bimodal, which occurs commonly in areas like astrophysics, ecology and genetics; see [,,], respectively. The first approach to fit a bimodal data is using a mixture of two unimodal distributions, for instance, a mixture of gaussian distributions; see []. The main disadvantage of this procedure is the non-identifiability of the proposed mixture model. The second and the most workable practical approach is to use distributions which already have bimodal properties. Because of these properties, there is an increasing interest to derive bimodal distributions in the literature: refs. [,] presented extensions of the skew-normal, ref. [] proposed a generalization of the Burr type X distribution and [] derived an extension of the sinh Cauchy distribution. In this paper, we will discuss an extension of the half normal distribution proposed by [], the truncated positive normal (tpn) model. The probability density function (pdf) for the tpn model is given by
where and are the scale and shape parameters, respectively, and and are the pdf and cumulative distribution function (cdf), respectively, of the standard normal distribution. The corresponding cdf of the tpn model is
Note that the cdf above has a closed-form expression, which is useful for generating random data besides defining quantiles. For more properties of the tpn model, see []. The restriction to positive values is a limitation of the tpn model. To overcome this limitation, the chief goal of this paper is to derive an extension of the tpn model which has support in the real line. We describe in detail the model, studying its main properties and related functions. Moreover, we show analytically the regions in which the model is unimodal and bimodal, and such regions depend only on one parameter.
The paper is organized as follows. In Section 2, we derive an extension of the tpn with support in the real line and study some properties of the distribution. The inference for parameter estimation in the proposed model and computational aspects are presented in Section 3. In Section 4, we perform a simulation study to evaluate the parameter estimation in finite samples. An application to real data is discussed in Section 5. Finally, conclusions are given in Section 6.
2. A Bimodal Truncation Positive Normal Distribution
In this section, we present the stochastic representation for the bimodal truncation positive normal (btpn) distribution and some properties, such as its pdf and its cdf. We also discuss some particular cases of the model.
2.1. Stochastic Representation, pdf and cdf
Let T be a discrete random variable such as
where . If , independent from T, then we define a new random variable given by . We say that X follows a btpn distribution.
Proposition 1.
The pdf for the btpn distribution is given by
where and .
Proof.
If , then the cdf for X is
Deriving the last expression in relation to x, we have
A similar routine calculation shows that for , we have that
completing the proof. □
Figure 1 shows the pdf function for the btpn model with different combination of parameters. Note that the model can assume different shapes, including unimodal, bimodal, symmetric and asymmetric.
Figure 1.
Pdf for btpn with different fixed values for and varying : (a) ; (b) ; (c) and; (d) .
Proposition 2.
The cdf of is given by
Proof.
It is immediate from the last proof. □
Proposition 3.
Let there be . Its quantile function is given by
Proof.
It is immediate from inverting the cdf for the btpn distribution given in Proposition 2. □
Corollary 1.
The median for is given by
Corollary 2.
The median for is , and if ϵ is , and , respectively.
2.2. Moments and Moment-Generating Function
The following proposition presents the central moments of the btpn distribution.
Proposition 4.
Let . The r-th central moment of X is given by
where is the upper incomplete gamma function.
Proof.
Note that , where and . For the first term, we perform the change of variable . With this,
Using the binomial theorem and the change of variable in the last expression, we obtain
Note that the last integral corresponds to .
On the other hand, for , we perform the change of variable , obtaining
Using the same routine calculation, we obtain
where again the last integral corresponds to . The final result is obtained by summing and . □
Proposition 5.
Let . The moment-generating function (mgf) for X is given by
Proof.
Note that , where and . For the first integral and using the change of variable , we obtain
Completing the square of a binomial in the last term of the exponential and using the change of variable , we have
For , and similarly to the previous development, we use the change of variable , obtaining that
Again, completing the square and using , we obtain
Finally, the result is obtained by summing and . □
Corollary 3.
Using properties of the mgf, the first four moments of can be obtained from the expression .
where is the reciprocal of the Mill’s ratio for the standard normal distribution.
Corollary 4.
The variance, coefficients of skewness and kurtosis forare given by
respectively.
Figure 2 shows the plots for asymmetry and kurtosis coefficients. Note that a more right-skewed distribution is obtained when and , whereas a more left-skewed model is obtained when and . On the other hand, a greater kurtosis is obtained when and , whereas a lower kurtosis is obtained when and . Note that this pattern is consistent with the pdf for different parameters presented in Figure 1.
Figure 2.
(a) Asymmetry coefficient and (b) kurtosis coefficient for btpn distribution.
2.3. Mode and Unimodality and Bimodality Regions
The next proposition presents the unimodality and bimodality property of the btpn distribution.
Proposition 6.
Let . For , the model is unimodal, and for , the model is bimodal. Moreover, for the unimodal case, the mode of the model is 0, and for the bimodal case, the two modes are and , respectively.
Proof.
By definition, the mode is the value that maximizes the pdf or, equivalently, the logarithm of the pdf. For , it is straighforward to show that
Therefore, solving the equation , we obtain and as the potential mode for each branch, because the second derivative is negative for each respective case. However, this is valid if and only if and , respectively. In other words, if , then is a mode and if , then also is a mode. This is equivalent to
where and , . On the other hand, and . For this reason, it is immediate that for , the btpn distribution have two modes, and such modes are and . Finally, for , it is immediate that , for and for . In other words, the pdf for the btpn distribution is an increasing function in and a decreasing function in , where we can deduce that the model is unimodal and the respective mode is attached in zero. □
Figure 3 shows the regions of unimodality and bimodality for the btpn depending on the parameters and .
Figure 3.
Regions of unimodality and bimodality for the btpn model in terms of and .
2.4. Particular Cases
By construction, the following models are particular cases for the btpn distribution:
- btpn tpn;
- btpn N, i.e., the normal distribution with mean 0 and variance ;
- btpn esn, i.e., the epsilon skew-normal distribution (Mudholkar and Hutson []).
Figure 4 summarizes the relationships among the btpn and its particular cases.
Figure 4.
Particular cases for the btpn distribution.
3. Inference
In this section, we discuss the maximum likelihood (ML) method for parameter estimation for the btpn model. We also provide details about the computational aspects.
3.1. Maximum Likelihood Function
Hereafter, and to simplify the estimation procedure, we consider the reparameterization . Therefore, henceforth, we denote , with , and scale, shape and asymmetry parameters, respectively, if its pdf is given by
Given , a random sample from the btpn distribution, the log-likelihood function for is given by
where
To find the ML estimator of , say , we need to maximize in (1) in relation to . However, no closed-form expressions for the ML estimates are possible. Therefore, we must use an iterative method for nonlinear optimization. For instance, we solve this problem using the Broyden-–Fletcher–-Goldfarb–-Shanno (BFGS) quasi-Newton method; see [] (p. 199).
3.2. Computational Aspects
The ML estimators for the btpn model and the obtaining of their standard errors are included in the tpn package [] from the R [] software. The following function can be used to obtain these results:
- est.btpn(y)
where y is the sample. The function returns a list with the estimates, the iterations used for the maximization algorithm, the log-likelihood function evaluated in the parameter estimations and the corresponding Akaike information criterion (AIC, see []) and the Bayesian information criterion (BIC, see []). Models with lower AIC and/or BIC are preferable. The package also includes the functions to drawn values to evaluate the pdf and the cdf for the btpn model named rbtpn, dbptn and pbtpn, respectively.
4. Simulation Study
In this section, we present a simulation study in order to evaluate the behaviour of the ML estimators in finite samples. The study was conducted using the tpn package []. Specifically, random samples were generated using the rbtpn function, and the estimation was performed using the est.btpn function. We considered 5000 Monte Carlo replicates for 3 sample sizes: 50, 100 and 200. We also considered 2 combinations for the scale parameter : 2 and 10; 3 values for : and 3; and 2 values for : and . This setting provides 36 combinations of the parameters and and the sample size. Table 1 and Table 2 summarize the empirical bias, the standard errors of the MLE (SE), the root-mean-squared error (RMSE) and the 95% coverage probability (CP) based on the asymptotic distribution of the MLE. In general terms, the bias and RMSE terms are reduced when the sample size is increased, suggesting the consistency of the MLE. Note also that the SE and RMSE terms are closer when the sample size is increased, suggesting that the standard errors of the estimators are also well estimated. Additionally, the CP terms converge reasonably to the nominal value used to their construction (95%), suggesting that the normality is reasonable as an asymptotic distribution to the ML estimators in the btpn model, even in reasonable sample sizes.
Table 1.
Empirical bias, SE, RMSE and 95% CP for the ML estimators of and in the btpn distribution with different combinations of parameters (case true ).
Table 2.
Empirical bias, SE, RMSE and 95% CP for the ML estimators of and in the btpn distribution with different combinations of parameters (case true ).
5. Application
In this section, we present an application to a real data set in order to illustrate the btpn model. We consider the height data set, which consists of the height of 126 students from the University of Pennsylvania (Cruz-Medina []). We compare our proposal with other bimodal proposals, such as the epsilon skew inverted gamma (esig, see Abdulah et al. []) and the alpha skew-normal (asn, Elal-Olivero []). The pdf for the esig model is given by:
where , and are the scale, shape and skewness parameters, respectively.
The pdf for the asn model is given by:
where and are the shape, location and scale parameters, respectively.
Table 3 summarizes some descriptive statistics for the sample, where we highlight the symmetrical behaviour of the data ().
Table 3.
Descriptive statistics for the data set.
Table 4 presents the estimatives, standard errors, AIC and BIC criteria for the mentioned models. Note that, based on both criteria, btpn presents a better fit than the rest of the distributions. Figure 5 shows the histogram for the height data and the pdf for the three considered distributions, where the better performance for the btpn in this data set is demonstrated. Moreover, as discussed in Proposition 6, implies a bimodal model, and such modes are equal to and . In addition, the distribution of height is very close to symmetry.
Table 4.
Estimated parameters and their standard errors (in parentheses) for the btpn, esig and asn models for the data set. The AIC and BIC criteria are also presented.
Figure 5.
Histogram for the data set and the estimated pdf for the btpn, esig and asn models.
We also compute the randomized quantile residuals [] for the three fitted models. If the model was correctly specified, these residuals should be a random sample from the standard normal distribution. Figure 6 shows the qqplot for such residuals, also suggesting that the btpn is a more appropriated model for this data set.
Figure 6.
Quantile residuals for fitted models: (a) asn, (b) esig, and (c) btpn.
6. Conclusions
The importance of fitting an observable dataset by a probability distribution is well-known, since it will be covered with convenient properties. Difficulty arises when the data is bimodal, because there are not traditional distributions with this property. This gap is being filled by an increasing movement in the statistical literature to develop probability distributions which already have a bimodality feature. In this paper, we made our contribution with the bimodal positive truncation normal distribution. The btpn distribution has the following advantages: support in the real line, closed-form cdf and moments, and the ability to generalize the standard normal and treatable maximum likelihood estimators. The ML procedure works very reasonably, i.e, as the sample size increases, the bias and the SE decrease. Since there are models for which the estimation procedure does not work even for large samples, the btpn distribution also has this strength. We ended the advantages of our proposed distribution with an application where btpn was the best choice of fitting. As suggestions for future work, we can mention two possibilities: the first entails the improvement of the asymptotic properties of the ML estimation through bias and variance corrections (see [,], respectively), and the second involves the addition of a regression structure. A closed-form cdf allows even a quantile regression structure, see [,], for instance, as [] did for the gamma–sinh Cauchy distribution. For the applicability and possibilities of future works, we think the bimodal positive truncation normal distribution is useful for practitioners and researchers of many different areas.
Author Contributions
Conceptualization, H.J.G., Y.M.G. and D.I.G.; Data curation, M.C.; Formal analysis, W.E.C., Y.M.G. and M.C.; Investigation, W.E.C. and T.M.M.; Methodology, H.J.G., T.M.M. and D.I.G.; Software, H.J.G., Y.M.G. and D.I.G.; Supervision, Y.M.G. and D.I.G.; Validation, M.C.; Visualization, W.E.C.; Writing—review editing, T.M.M. and D.I.G. All authors have read and agreed to the published version of the manuscript.
Funding
The research of H.J.G. was supported by the Proyecto de Investigación de Facultad de Ingeniería. Universidad Católica de Temuco. UCT-FDI032020. W.E.C., Y.M.G., M.C. and D.I.G. also acknowledge the support of Proyecto Gidi: “La estadística como respuesta a problemas de otras áreas” supported by the University of Atacama.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Ashman, K.M.; Bird, C.M.; Zepf, S.E. Detecting Bimodality in Astronomical Datasets. Astron. J. 1994, 108, 2348–2361. [Google Scholar] [CrossRef] [Green Version]
- Michele, C.; Accatino, F. Tree cover bimodality in savannas and forests emerging from the switching between two fire dynamics. PLoS ONE 2014, 9, e91195. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, J.; Wen, S.; Symmans, W.F.; Pusztai, L.; Coombes, K.R. The bimodality index: A criterion for discovering and ranking bimodal signatures from cancer gene expression profiling data. Cancer Inform. 2009, 7, 199–216. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Robertson, C.A.; Fryer, J.G. Some descriptive properties of normal mixtures. Scand. Actuar. J. 1969, 3–4, 137–146. [Google Scholar] [CrossRef]
- Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics 2011, 22, 50–62. [Google Scholar] [CrossRef]
- Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul. 2018, 88, 156–181. [Google Scholar] [CrossRef]
- Butt, N.S.; Khalil, M.G. A New Bimodal Distribution for Modeling Asymmetric Bimodal Heavy-Tail Real Lifetime Data. Symmetry 2020, 12, 2058. [Google Scholar] [CrossRef]
- Gómez, Y.M.; Gómez-Déniz, E.; Venegas, O.; Gallardo, D.I.; Gómez, H.W. An Asymmetric Bimodal Distribution with Application to Quantile Regression. Symmetry 2019, 11, 899. [Google Scholar] [CrossRef] [Green Version]
- Gómez, H.J.; Olmos, N.M.; Varela, H.; Bolfarine, H. Inference for a truncated positive normal distribution. Appl. Math. J. Chin. Univ. 2018, 33, 163–176. [Google Scholar] [CrossRef]
- Mudholkar, G.S.; Hutson, A.D. The epsilon–skew–normal distribution for analyzing near-normal data. J. Stat. Plan. Inference 2000, 83, 291–309. [Google Scholar] [CrossRef]
- Mittelhammer, R.C.; Judge, G.G.; Miller, D.J. Econometric Foundations; Cambridge University Press: New York, NY, USA, 2000. [Google Scholar]
- Gallardo, D.I.; Gómez, H.J.; Gómez, Y.M. tpn: Truncated Positive Normal Model and Extensions. R Package Version 1.1. 2021. Available online: https://cran.r-project.org/web/packages/tpn/index.html (accessed on 25 January 2022).
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 25 January 2022).
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Cruz-Medina, I.R.; Olmos, N.M. Almost Nonparametric and Nonparametric Estimation in Mixture Model. Ph.D. Thesis, Pennsylvania State University, State College, PA, USA, 2001. [Google Scholar]
- Abdulah, E.K.; Elsalloukh, H. Bimodal Class based on the Inverted Symmetrized Gamma Distribution with Applications. J. Stat. Appl. Probab. 2014, 3, 1–7. [Google Scholar] [CrossRef]
- Elal-Olivero, D. Alpha-skew-normal distribution. Proyecciones 2010, 29, 224–240. [Google Scholar] [CrossRef] [Green Version]
- Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
- Magalhães, T.M.; Gómez, Y.M.; Gallardo, D.I.; Venegas, O. Bias reduction for the Marshall-Olkin extended family of distributions with application to an airplane’s air conditioning system and precipitation data. Symmetry 2020, 12, 851. [Google Scholar] [CrossRef]
- Magalhães, T.M.; Botter, D.A.; Sandoval, M.C. A general expression for second-order covariance matrices—An application to dispersion models. Braz. J. Probab. Stat. 2021, 35, 37–49. [Google Scholar] [CrossRef]
- Cade, B.S.; Noon, B.R. A gentle introduction to quantile regression for ecologists. Front. Ecol. Environ. 2003, 1, 412–420. [Google Scholar] [CrossRef]
- Alencar, A.P.; Santos, B.R. Association of pollution with quantiles and expectations of the hospitalization rate of elderly people by respiratory diseases in the city of São Paulo, Brazil. Environmetrics 2014, 25, 165–171. [Google Scholar] [CrossRef]
- Gómez, Y.M.; Gallardo, D.I.; Venegas, O.; Magalhães, T.M. An asymmetric bimodal double regression model. Symmetry 2021, 13, 2279. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).