Abstract
In this article, we introduce a new continuous distribution based on the unit interval. This distribution is generated from a transformation of a random variable with half-normal distribution. We study its basic properties, percentiles, moments and order statistics. Maximum likelihood estimation is applied, and we present a simulation study to observe the behavior of the maximum likelihood estimators. We examine two applications to real proportions datasets, where the new distribution is shown to provide a better fit than other distributions defined in the unit interval.
MSC:
62E15; 62E20
1. Introduction
In real life, it is quite common to find continuous data sets in the interval . These data are the product of measurements that interpret different indices and rates. An example is insurance data, where a probability distribution can be used as a distortion function to define a premium principle (see Denuit et al. [1]). There are many studies involving measurements between , see for example Cook et al. [2] and Gupta and Nadarajah [3], etc. Continuous distributions with support in are fundamental for modeling these data; for example, the two-parameter Beta is the model most frequently used to model data of this kind due to its great flexibility (see Johnson et al. [4]). A random variable (r.v.) X is called a Beta distribution with parameters and if its probability density function (pdf) is given by
where , and B() is the Beta function. Another distribution with support in is the Kumaraswamy distribution (see Kumaraswamy [5]). A r.v. Z has a Kumaraswamy (KM) distribution with parameters and if its pdf is given by
where and .
In recent years, several distributions with positive support have been transformed into distributions with unit support, for example Grassia [6], based on the Gamma distribution; Jones [7], based on the Kumaraswamy distribution; Mazucheli et al. [8], based on the Birnbaum-Saunders distribution; Ghitany et al. [9], based on the inverse Gaussian distribution; Modi et al. [10], based on the Burr III distribution; Korkmaz and Chesneau [11], based on the Burr XII distribution; Haq et al. [12], based on the modified Burr-III distribution; Gómez-Déniz et al. [13], Mazucheli et al. [14], Mazucheli et al. [15], based on the Lindley distribution, and more recently Bakouch et al. [16] based on the half-normal (HN) distribution. For example a distribution with support in and only one parameter is the unit-Lindley distribution (see Mazucheli et al. [14]). A r.v. Z is called a unit-Lindley (UL) distribution with parameter if its pdf is given by
where .
In this article, we introduce a new probability distribution with a restricted domain. Its distribution is derived by modifying the representation of the unit-half-normal (UHN) distribution introduced by Bakouch et al. [16]. One of the motivations of distribution theory is to provide new alternatives to known distributions in order to improve the statistical modeling of certain datasets. Our work is based on the HN distribution. Thus, we say that an r.v. X is called an HN distribution with scale parameter if its pdf is given by
with , and is the expression of the standard normal distribution. We denote this by and some of its properties are:
- The cumulative distribution function (cdf) of X is
- The th moments are expressed by ,
where is the cdf of the standard normal distribution, and is the gamma function. Hogg and Tanis [17] discuss some properties of the HN distribution.
Bakouch et al. [16] introduce the UHN distribution, which is the product of a transformation of the random variables . Using the following transformation they obtain the UHN distribution, the pdf of which is given by
where and we denote it by .
In Figure 1, we show the pdf of the UHN distribution for several values of . In Figure 2, we show a histogram of a proportions dataset. The shape that can be adopted by the UHN distribution close to zero does not represent this dataset; we therefore sought a different transformation with this characteristic. The main object of the present article is to study a new distribution that is a modification of the UHN distribution and offers an alternative to the UHN distribution for modeling proportion data with positive asymmetry, as shown by the data in Figure 2.
Figure 1.
Densities UHN (0.5) (black), UHN (0.9) (red), UHN (1.5) (blue) and UHN (2) (green).
Figure 2.
Histogram of a data set of proportions.
The rest of the paper is organized as follows. In Section 2, we give the representation of this distribution and generate the new density, its properties, moments and order statistics. In Section 3, we derive an inference by maximum likelihood (ML) and carry out a simulation study. Section 4 shows two applications to real datasets. In Section 5 we provide some final conclusions.
2. Density Function and Properties
In this section, we introduce the representation, density and properties of the new distribution.
2.1. Stochastic Representation
The representation of this new distribution is
where , , and we call the distribution of Z the modified unit-half-normal (MUHN). This is denoted by . Mazucheli et al. [15] use this representation in the Lindley distribution, obtaining a distribution called New Unit-Lindley (NUL). Applications of the NUL distribution are given in Ferreira and Mazucheli [18] and Alrumayh et al. [19], among others.
2.2. Density Function
The following result shows the pdf of the MUHN distribution, which is generated using the representation given in (2).
Proposition 1.
Let . Then, the pdf of Z is given by
where .
Proof.
Let , using the representation given in (2), and the random variables transformation method the result is obtained. □
Proposition 2.
Let . Then, the MUHN distribution has unimodality at .
Proof.
Differentiating the density given in (3) with respect to z set equal to zero gives the result. □
In Figure 3, we show the pdf of the MUHN distribution for several values of .
Figure 3.
Densities MUHN (0.5) (black), MUHN (0.9) (red), MUHN (1.5) (blue) and MUHN (2) (green).
2.3. Cumulative Distribution Function
The following proposition shows the cdf of the MUHN distribution.
Proposition 3.
Let . Then, the cdf of Z is given by
where .
Proof.
Calculating the cdf of Z directly, we have
Making the following change of the variable , the result is obtained. □
2.4. Reliability Analysis
The reliability function and the hazard function of the MUHN distribution are given in the following corollary.
Corollary 1.
Let . Then, the reliability and hazard of T is given by
where is shape parameter.
In Figure 4, we show the Hazard function of MUHN distribution for several values of .
Figure 4.
Hazard function of MUHN distribution for selected values of : (black), (red) and (blue).
Proposition 4.
Let . Then, the quantile function (Q) of the MUHN distribution is given by
where is the inverse cdf of a standard normal distribution.
Proof.
Using the cdf given in (4), we have
Applying the inverse function of the cdf of a standard normal distribution and clearing for z, the result is obtained. □
2.5. Order Statistics
Let be a random sample of the r.v. . We denote by the order statistics, .
Proposition 5.
The pdf of is
In particular, the pdf of the minimum, , is
and the pdf of the maximum, , is
Proof.
Since the model is absolutely continuous, the pdf of the order statistics is obtained by applying
where F and f denote the cdf and pdf of the parent distribution, in this case. □
2.6. Moments
An important numerical function for calculating the r-th moments of the random variable is defined as
More details of this function can be found in Appendix A.
Proposition 6.
Let . Then, for the r-th moment of Z is given by
Proof.
Using the representation given in (2) and calculating the r-th moments directly, we have
Making the following change in the variable , the result is obtained. □
Corollary 2.
Let . Then, the mean and variance of the r.v. Z are given respectively by
and the asymmetry and kurtosis coefficients are given respectively by
Figure 5 depicts plots for the asymmetry and kurtosis coefficients in the MUHN distribution.
Figure 5.
Plots of the asymmetry and kurtosis coefficients for the MUHN model.
Proposition 7.
Proof.
The following proposition shows a closed expression for negative moments.
Proposition 8.
Let . Then, for the negative r-th moment of Z is given by
Proof.
Calculating the negative moments directly using binomial theorem, the result is obtained. □
From this we have that:
3. Inference
In this section, we estimate the parameter of the MUHN model using a modified moments (MM) method and the ML method, we present a simulation study, and we discuss the asymptotic estimation of the ML estimator.
3.1. Mm Estimation
For a random sample derived from the MUHN() distribution, , then MM estimator of is:
3.2. Ml Estimation
For a random sample derived from the MUHN() distribution, the log-likelihood function can be written as
The score equation is given by
the ML estimator for () is obtained by resolving the following Equation (13) and its
Hence, for large samples, the ML estimator, , is asymptotically normal, that is,
It results from this that the asymptotic variance of the ML estimator is the inverse of Fisher’s information , i.e.,
Proposition 9.
For a random sample derived from the MUHN(σ) distribution, we have that
where denotes the chi-distribution with n degrees of freedom.
Proof.
As , then , where chi-squared distribution with 1 degree of freedom. From the properties of the chi-squared distribution we have , luego . □
Corollary 3.
Some direct consequences of the result given in (15) are
The estimator is asymptotically unbiased for . With these results the bias and the mean squared error can be calculated.
3.3. Simulation Study
To examine the behavior of the ML estimation approach, we carried out a simulation study to assess the performance of the estimation, using the parameter of the MUHN distribution. Two algorithms, Algorithms 1 and 2, are proposed to generate random numbers from the MUHN distribution. The simulation analysis was carried out by generating 1000 samples of size n = 30, 38, 40, 50, 80 and 100 from the MUHN distribution. The objective of this simulation is to study the behavior of the ML of the parameter of the MUHN model.
| Algorithm 1 to simulate values from the distribution. |
|
| Algorithm 2 to simulate values from the distribution. |
|
The code for both algorithms can be found in the following repository https://github.com/isaaccortes1989/MUHN-Codes (accessed on 2 November 2023). Since the results of the two algorithms are similar, we only present those of Algorithm 1. Table 1 displays the empirical bias (B), standard deviation (SD), mean of the standard errors (SEs), root of the empirical mean squared error (RMSE), and the coverage probability (CP). The CP terms converge reasonably to the nominal value used for their construction (95%), suggesting that the normality is reasonable for the asymptotic distribution of the ML estimators in the MUHN model. As shown in Table 1, the performance of the estimations improves as n increases.
Table 1.
ML estimates, B, SD, SE, RMSE, and CP for the MUHN model with sample size 30, 38, 40, 50, 80 and 100, respectively.
4. Applications
This section shows two applications of the MUHN model, highlighting its superior performance compared to other models known in the statistical literature.
4.1. Application 1
In this first application, we fit the MUHN distribution and compare it with the uniparametric UL and UHN distributions and the two-parameter Beta and KM distributions defined in the Introduction. The dataset consists of 48 samples of rocks from an oil reservoir, as reported by Cordeiro and Brito [20]. We conducted an analysis of the shape perimeter using a squared variable (area). The data are in Table 2:
Table 2.
The data of 48 samples of rocks from an oil reservoir.
Table 3 displays basic descriptive statistics for the dataset. We employ the notation and to denote sample asymmetry and kurtosis coefficients, respectively.
Table 3.
Descriptive statistics for the first dataset.
Using Section 3.1, the MM estimator of is
Table 4 shows the parameters estimated by ML for the UL, UHN, MUHN, Beta and KM models. Standard errors of the ML estimates are calculated using Fisher’s information corresponding to each model. For each model, we report the values of the Akaike information criterion (AIC), introduced by Akaike [21], and the Bayesian information criterion (BIC), proposed by Schwarz [22]. It is observed that both AIC and BIC criteria indicate a better fit for the Beta model.
Table 4.
Parameter estimates with SEs (in parentheses), AIC and BIC values for UL, UHN, MUHN, Beta and KM models.
Figure 6 illustrates the ML fit of the five models with the probability histogram. Additionally, we calculate the quantile residuals (QRs). If the model is suitable for the data, the QRs should be a sample from the standard normal model (see Dunn and Smyth [23]). This assumption can be validated using traditional normality tests, such as the Anderson–Darling (AD), Cramér-von Mises (CVM) and Shapiro–Wilkes (SW) tests.
Figure 6.
Histogram for rock samples from a petroleum reservoir; lines represent distributions fitted using ML estimates: UL (red), UHN (blue), MUHN (green), Beta (black) and KM (brown).
In Figure 7, the QRs for the fitted models and the p-values for the AD, CVM and SW normality tests are provided to assess whether the QRs follow the standard normal distribution. It is observed that the QRs follow the standard normal distribution only for the MUHN model; in other word, all three test show that the data did not come from the UL, UHN, Beta and KM distributions. Figure 7 suggest that the MUHN model gives a better fit for this dataset.
Figure 7.
QQ-plots of the QRs: UL distribution (a); UHN distribution (b); MUHN distribution (c); Beta distribution (d) and KM distribution (e).
The codes for this application are available on the following website: https://github.com/isaaccortes1989/MUHN-Codes/tree/main/First%20Application (accessed on 2 November 2023).
4.2. Application 2
In this second application, we fit the MUHN distribution and compare it with the two-parameter Beta and KM distributions that are defined in the Introduction. The data set consists of a sample of 38 proportions formed by COVID information taken from the Chilean database, from 4 March to 10 April 2020. These data were formed using new cases (NC), daily accumulated cases (AC) and daily cumulative deaths (CD). We analyze the proportion of daily NC to the accumulated number of survivors with the equation:
The data are in Table 5.
Table 5.
The data of a sample of 38 proportions formed by COVID information.
Table 6 shows the descriptive summary of the data, highlighting their positive asymmetry. Using Section 3.1, the MM estimator of is
Table 6.
Descriptive statistics for the second dataset.
The estimates, SE, AIC, and BIC values for the UL, UHN, KM, Beta and MUHN models are displayed in Table 7. From the table, it is evident that the MUHN model demonstrates a better fit, as it exhibits the smallest criteria values with only one parameter. Furthermore, the fit of the five models with the histogram of the data can be observed in Figure 8, confirming the better fit of the MUHN model. Finally, as shown in Figure 9, all QRs indicate that a standard normal distribution is followed only for the MUHN model.
Table 7.
Parameter estimates with their respective SE (in parentheses), AIC and BIC values for the indicated model.
Figure 8.
Histogram for COVID dataset; lines represent distributions fitted using ML estimates.
Figure 9.
QQ-plots of the QRs: KM distribution (a); Beta distribution (b); MUHN distribution (c); UL distribution (d) and UHN distribution (e).
The codes for this application are available on the following website: https://github.com/isaaccortes1989/MUHN-Codes/tree/main/Second%20Application (accessed on 2 November 2023).
5. Discussion
This paper presents a study of the MUHN distribution. We show some properties and compare them with the UHN distribution in a fit using ML estimation. The MUHN distribution appears to be a viable alternative for fitting data between zero and one, and with positive asymmetry. Some other characteristics of the MUHN distribution are:
- The representation of the MUHN distribution is simple.
- The MUHN distribution has an explicit mode.
- The cdf, hazard function and quantile function are explicit and represented by known functions.
- The ML estimator shows very good behavior with small samples.
- The applications show that the MUHN distribution is a very good alternative when the data present positive asymmetry; this was confirmed by both the AIC and BIC model selection criteria and by the Anderson–Darling, Cramér-von Mises and Shapiro–Wilkes statistical tests.
Author Contributions
Conceptualization, I.E.C. and H.W.G.; methodology, P.I.A. and H.V.; software, P.I.A. and I.E.C.; validation, I.E.C., O.V. and H.W.G.; formal analysis, P.I.A., I.E.C. and O.V.; investigation, H.V. and O.V.; writing—original draft preparation, P.I.A. and H.V.; writing—review and editing, I.E.C. and O.V.; funding acquisition, O.V. and H.W.G. All authors have read and agreed to the published version of the manuscript.
Funding
The research of P.I.A., H.V. and H.W.G. was supported by Semillero UA-2023.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data sets are available in the text.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
The function defined in (7) when is given by:
and when is given by
where is the generalized hypergeometric function (for more details see Abramowitz and Stegun [24]), and for a definition and properties of the integral functions and we refer the reader to Weisstein [25,26].
References
- Denuit, M.; Dhaene, J.; Goovaerts, M.J.; Kaas, R. Actuarial Theory for Dependent Risks; John Wiley & Sons Ltd.: Chichester, UK, 2005. [Google Scholar]
- Cook, D.O.; Kieschnick, R.; McCullough, B. Regression analysis of proportions in finance with self selection. J. Empir. Financ. 2008, 15, 860–867. [Google Scholar] [CrossRef]
- Gupta, A.K.; Nadarajah, S. Handbook of Beta Distribution and Its Applications; CRC Press: New York, NY, USA, 2004. [Google Scholar]
- Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; John Wiley & Sons Inc.: New York, NY, USA, 1995; Volume 2. [Google Scholar]
- Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
- Grassia, A. On a Family of Distributions with Argument between 0 and 1 Obtained by Transformation of the Gamma Distribution and Derived Compound Distributions. Aust. J. Stat. 1977, 19, 108–114. [Google Scholar] [CrossRef]
- Jones, M.C. Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Stat. Methodol. 2009, 6, 70–81. [Google Scholar] [CrossRef]
- Mazucheli, J.; Menezes, A.F.B.; Dey, S. The unit-Birnbaum-Saunders distribution with applications. Chil. J. Stat. 2018, 1, 47–57. [Google Scholar]
- Ghitany, M.; Mazucheli, J.; Menezes, A.; Alqallaf, F. The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Commun. Stat. Theory Methods 2018, 48, 3423–3438. [Google Scholar] [CrossRef]
- Modi, K.; Gill, V. Unit Burr III distribution with application. J. Stat. Manag. Syst. 2019, 23, 579–592. [Google Scholar] [CrossRef]
- Korkmaz, M.; Chesneau, C. On the unit Burr-XII distribution with the quantile regression modeling and applications. Comput. Appl. Math. 2021, 40, 29. [Google Scholar] [CrossRef]
- Haq, M.; Hashmi, S.; Aidi, K.; Ramos, P.F.L. Unit Modified Burr-III Distribution: Estimation, Characterizations and Validation Test. Ann. Data Sci. 2023, 10, 415–449. [Google Scholar] [CrossRef]
- Gómez-Déniz, E.; Sordo, M.A.; Calderín-Ojeda, E. The Log-Lindley distribution as an alternative to the beta regression model with applications in insurance. Insur. Math. Econ. 2014, 54, 49–57. [Google Scholar] [CrossRef]
- Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. J. Appl. Stat. 2020, 46, 700–714. [Google Scholar] [CrossRef]
- Mazucheli, J.; Bapat, S.R.; Menezes, A.F.B. A new one-parameter unit-Lindley distribution. Chil. J. Stat. 2020, 11, 53–67. [Google Scholar]
- Bakouch, H.S.; Nikb, A.S.; Asgharzadehb, A.; Salinas, H.S. A flexible probability model for proportion data: Unit-half-normal distribution. Commun. Stat. Case Stud. Data Anal. 2021, 7, 271–288. [Google Scholar] [CrossRef]
- Hogg, R.V.; Tanis, E.A. Probability and Statistical Inference, 4th ed.; MacMillan Publishing: New York, NY, USA, 1993. [Google Scholar]
- Ferreira, A.B.; Mazucheli, J. The zero, one and zero-and-one-inflated new unit-Lindley distributions. Braz. J. Biom. 2022, 40, 291–326. [Google Scholar] [CrossRef]
- Alrumayh, A.; Weera, W.; Khogeer, H.A.; Almetwally, E.M. Optimal analysis of adaptive type-II progressive censored for new unit-Lindley model. J. King Saud Univ. Sci. 2023, 35, 102462. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; Brito, R.D.S. The beta power distribution. Braz. J. Probab. Stat. 2012, 26, 88–112. [Google Scholar]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Dunn, P.K.; Smyth, G.K. Randomized Quantile Residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
- Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th ed.; National Bureau of Standards: Gaithersburg, MD, USA, 1968.
- Weisstein, E.W. “Erfi”. From MathWorld—A Wolfram Web Resource. Available online: https://mathworld.wolfram.com/Erfi.html (accessed on 2 January 2023).
- Weisstein, E.W. “Exponential Integral”. From MathWorld—A Wolfram Web Resource. Available online: https://mathworld.wolfram.com/ExponentialIntegral.html (accessed on 2 January 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).