1. Introduction
Statistical methodology plays an important role in quantitative methods, given the hypothesis testing and inferential procedures. Nonetheless, the comparison across features is given based on a generated function estimated from the data information. Most often, mild suppositions are assumed, which compromises the generalization of the results.
Under the perspective of statistical generalization (inferential method), some challenges are found for bounded distribution estimation. For instance, the confidence interval, which is often adopted from the maximum likelihood estimation approach and asymptotic supposition, is also assumed. Specially, interval estimation can be seen as the parameter space domain.
One exemplification is the case in which bounded information data are observed and, nonetheless, normality is commonly assumed to be true. This is the case of proportion/rate data, which are double bounded in the lower limit equal to zero and upper limit equal to one. Relative humidity is an example of this scenario in which every decisionmaking should be
$\in [0,1]$ [
1,
2], or rates commonly used in the fields of finance, economics and demography, to number a few.
In the case of rates and proportions processes, as well as other processes whose variables of interest assume values in the range
$(0,1)$, there is a wellrepresented class of models, the unit distributions family, which deals with this type of doublebounded data. Among the many existing unit distributions, it is noteworthy mentioning the power distribution, beta distribution [
3], Kumaraswamy distribution [
4], unitlogistic distribution [
5], simplex distribution [
6], unitWeibull distribution [
7,
8], unitLindley distribution [
9], unithalfnormal distribution [
10], unit loglog distribution [
11], modified Kumaraswamy and reflected modified Kumaraswamy distributions [
12], unitTeissier distribution [
13], unit extended Weibull families of distributions [
14], lognormal distribution [
15], unit folded normal distribution [
16], MarshallOlkin reduced Kies distribution [
17], and unitChen distribution [
18].
Despite the applicability of the unit distributions in doublebounded variables, another important fact is that the interval estimation for the parameter may also be limited in a domain (like positive real number). In the face of it, we also presented an inferential alternative through the delta method.
This study starts with a presentation of an important theorem that changes from a modification of the standard normal distribution into a class of density functions that can be seen as a unit. Then, as an exemplification, a second moment case was chosen to illustrate the usefulness of this class of probabilistic models. This class of distributions shows to be competitive for highfrequency data with range greater than 0.4, important to realworld applications, whereas a classical unit distribution fails [
19]. Additionally, two different data sets were selected to illustrate the adjustment of the proposed model. The first one is related to the Chilean inflation (ultimate postmilitary era), and the second one comes from the driest area of the planet (excluding the north and south poles).
This paper is structured in four parts.
Section 2 presents the proposed oneparameter unit distribution. In
Section 3, the inferences for the distribution parameter adopting the uniformly minimumvariance unbiased estimator (UMVUE) and maximum likelihood estimator (MLE) as point estimators, as well as interval estimations, are discussed. A simulation study is also presented in this section. In
Section 4, two real data sets are used to illustrate the proposed methodology, one from the Chilean inflation in the postmilitary period, and other one from the relative humidity water monitoring in the Atacama Desert. Finally,
Section 5 lists the conclusions of this study. Nevertheless, before moving on into the described structure, a wide class of models that can be generated in many different random variable supports is presented. Therefore, a theorem is elicited and, as a special case, the whole paper will consider an order two for exemplification of this powerful class of distributions.
Motivation
The normal (or Gaussian) distribution is very important to the history of statistics, and numerous modifications to this distribution have been proposed in the literature [
20,
21]. An interesting fact related to the normal distribution is that its even moments can be used to generate new distributions, which is the case presented below, through a definition and a result embodied in a theorem that accounts for the characterization of these new distributions.
Definition 1. A random variable B is said to be distributed according to a Bimodal Normal (BN) distribution of order k, that is, $B\sim \mathit{BN}\left(k\right)$ (discussed in [22]), if its probability density function (PDF) is given byin which $\varphi (\xb7)$ is the PDF of the standard normal distribution, $c={\prod}_{j=1}^{k}(2j1)$ and $k=\{1,2,3,\dots \}$. This class of distributions is always bimodal, which means that the observed modes move away from each other when the order
k increases (as depicted by
Figure 1).
It is noteworthy mentioning that transformations derived from the $\mathrm{BN}\left(k\right)$ distribution may lead to other domains of interest, e.g., the unit domain. For example, let $B\sim \mathrm{BN}\left(k\right)$, then a scale parameter $\alpha $, the transformation $\alpha \leftB\right\in {\mathbb{R}}^{+}$, and then the transformation ${e}^{\alpha \leftB\right}\in [0,1]$. Thus, the stochastic characterization of a $\mathrm{BN}\left(k\right)$ distribution can be obtained according to the following theorem.
Theorem 1. Let ${W}_{1}$ and ${W}_{2}$ be independent random variables, in which ${W}_{1}$ is such that $\mathbb{P}({W}_{1}=1)=\mathbb{P}({W}_{1}=1)=1/2$ and ${W}_{2}\sim {\chi}_{2k+1}^{2}$. Then, So, this theorem is mainly motivated by the result that shows that if
$X\sim \mathrm{BN}\left(k\right)$, then
${X}^{2}\sim {\chi}_{2k+1}^{2}$. The entire demonstration is presented in
Appendix A.
2. The Model
In this section, a new unit distribution, named AlphaUnit, which presents a single parameter, $\alpha $, is discussed. Its stochastic representations (probability density and cumulative distribution functions), moments (including mean and variance), momentgenerating function, and how to generate pseudorandom numbers from it will be presented. Moreover, a proposal of statistical control chart for unit data based on the AlphaUnit distribution will also be shown.
The AlphaUnit density is originated from the general theorem (Theorem 1), by considering $k=1$. Moreover, it represents the second moment of the standard normal distribution and, later, transformed its domain. However, as k increases, the concentration of the distribution intensifies and other densities could be obtained.
Properties and Characterization
Definition 2. (AlphaUnit distribution). A random variable X follows an AlphaUnit (AU) distribution with parameter $\alpha >0$, that is, $X\sim AU\left(\alpha \right)$, if its PDF is given by Remark 1. If $X\sim AU\left(\alpha \right)$, then its PDF is unimodal.
Proof. The maxima of the AU distribution are studied, to which the criterion of the first derivative is first considered:
By solving algebraically for
x, we obtain:
By working algebraically, it can be seen that this is only true for (ii), and is a global maximum, given that the solution is in between 0 and 1. Therefore, the AU distribution is unimodal. □
Proposition 1. If$X\sim \mathit{AU}\left(\alpha \right)$,
then itsr
th order moment is given byin which$\mathsf{\Phi}(\xb7)$is the cumulative distribution function (CDF) of the standard normal distribution.
Proof. From the definition of the
rth order moment, we have:
By changing the variables:
then substituting into Equation (
3) and developing algebraically, we obtain:
Then, by making another change of variables:
$h=u\alpha r$,
$dh=du$; and replacing these expressions in the previous equation, we have:
By solving the integrals, we get to:
Then, by solving algebraically, we go down to the expression of Proposition 1. □
Out of Proposition 1, we obtain the mean and variance of the
$\mathrm{AU}\left(\alpha \right)$ model as it follows:
Remark 2. As an illustration, Figure 2 displays the generated asymmetry and kurtosis based on the chosen α parameter of the AU distribution. Proposition 2. If$X\sim \mathit{AU}\left(\alpha \right)$,
then its CDF is given by Proof. By definition, the CDF is:
By making the change of variables:
then substituting into Equation (
4) and reducing expressions algebraically, we get to:
By calculating the integral, we find:
Then, by multiplying and commuting, we get to the expression of Proposition 2. □
Additionally, if
X denotes the monitored variable, then the PDF of
X is given by (
2). Also, consider that the probability of false alarm (known as type I error) is
$\pi $. Thus, we get to:
in which
$\alpha $ is the incontrol process parameter (that is, the parameter that controls the quality characteristic based on the incontrol state), and LCL and UCL are the lower and upper control chart limits, respectively. Given the CDF
${F}_{X}(x\mid \alpha )$, then the quantile function of
X is defined by
$Q(p\mid \alpha )={F}_{X}^{1}(p\mid \alpha )$,
$0<p<1$, which can be obtained by setting to zero and solving (numerically) for
x the following equation:
Following [
23], the control limits and centerline (CL) of the proposed control chart for unit data based on the AU distribution or, simply, AU control chart, are given by
in which
$Q(.)$ is the quantile function of the
$\mathrm{AU}\left(\alpha \right)$ distribution.
Proposition 3. If$X\sim \mathit{AU}\left(\alpha \right)$,
then its momentgenerating function (MGF) is given by Proof. By definition, the MGF is:
By making the following change of variables:
then substituting and simplifying into Equation (
5), we get to:
Working algebraically, we obtain:
By making the following change of variables:
$h=u\alpha k$,
$dh=du$; then substituting it into the previous equation, we get to:
Then, by solving the integral and adjusting algebraically, we get to the expression of Proposition 3. □
The pseudocode presented in Algorithm 1 describes the important steps for the generation of random (in fact, pseudorandom) numbers from the
$\mathrm{AU}\left(\alpha \right)$ distribution. Further proofs are attached under
Appendix B.
Algorithm 1 Random number generation from the $\mathrm{AU}\left(\alpha \right)$ model. 
Step 1.Generate a random number ${x}_{1}\sim {\chi}_{3}^{2}$. Step 2. Generate a random number $u\sim \mathrm{Uniform}(0,1)$. If $u\le 1/2$, set $v=\sqrt{{X}_{1}}$; otherwise, $v=\sqrt{{x}_{1}}$. Step 3. Based on the numbers obtained, generate $y=\alpha \leftv\right$, in which $\alpha $ is a (positive) scale parameter and $\leftv\right$ follows a Bimodal HalfNormal (BHN) distribution. Step 4. Conclude with the number generated by Step 3 as a negative power of base e, that is, $x={e}^{y}={e}^{\alpha \leftv\right}\in [0,1]$. Step 5. Repeat Steps 1–4 n times to obtain a random sample of size n from the $\mathrm{AU}\left(\alpha \right)$ model.

3. Inference
In this section, the parameter estimation adopting the UMVUE and MLE approaches are discussed. At first, it will be demonstrated that the UMVUE can be obtained straightforwardly, since the proposed AU distribution is part of the exponential family. Later, the MLE will also be discussed, which will help to estimate not only the point estimation of the $\alpha $ parameter, but also the interval estimation. We enrolled the reasoning considering the asymptotic convergence in distribution of the parameter estimator, as well as adapted a transformation that ensures that the interval of the parameter will always be on its domain (the delta method). The delta transformation procedure will enable the correct inferences and the standard error calculation associated with the parameter estimate. Later on, a simulation study to illustrate these theoretical results is presented.
3.1. UMVUE through the Exponential Family
Many of the distributions used in statistics belong to the exponential family, thereby implying in a considerable advantage over other models that do not belong to this family. Such an advantage is significantly declared when it comes to calculating the statistic $T\left(\mathit{X}\right)$ of a random sample $\mathit{X}=\left({X}_{1},{X}_{2},\dots ,{X}_{n}\right)$. Next, it is shown that the proposed $\mathrm{AU}\left(\alpha \right)$ distribution belongs to this family.
A random variable
X is said to belong to the
oneparameter exponential family if its associated PDF
$f(\xb7\mid \theta )$ can be written in the form of:
Let
$X\sim \mathrm{AU}\left(\alpha \right)$, then the PDF of
X can be written in exponential form as it follows:
Then,
X belongs to the oneparameter exponential family if we define:
Let
$\mathit{x}=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$ be an observation (or realization) of the random sample
$\mathit{X}=\left({X}_{1},{X}_{2},\dots ,{X}_{n}\right)$, with
${X}_{i}\sim \mathrm{AU}\left(\alpha \right)$, for
$i=1,2,\dots ,n$. Then, the joint PDF presented in exponential form is
from which it can be concluded that the statistic
$T\left(\mathit{X}\right)={\sum}_{i=1}^{n}{[ln\left({X}_{i}\right)]}^{2}$ is sufficient and complete, once the AU distribution is part of the exponential family.
Proposition 4. Let$\mathit{X}=\left({X}_{1},{X}_{2},\dots ,{X}_{n}\right)$be a random sample, with${X}_{i}\sim \mathit{AU}\left(\alpha \right)$,
for$i=1,2,\dots ,n$,
and$T\left(\mathit{X}\right)={\sum}_{i=1}^{n}{[ln\left({X}_{i}\right)]}^{2}$.
Then,
Proof. If
$G={\left[\frac{ln\left(X\right)}{\alpha}\right]}^{2}$, then
$G\sim {\chi}_{3}^{2}$. Thus,
n independent and identically distributed samples of
G will have the sum of
n${\chi}_{3}^{2}$, which will result in a chisquared distribution with degrees of freedom equal to
$3n$, that is,
${\chi}_{3n}^{2}$, since
so,
□
Proposition 5. Let$\mathit{X}=\left({X}_{1},{X}_{2},\dots ,{X}_{n}\right)$be a random sample, with${X}_{i}\sim \mathit{AU}\left(\alpha \right)$,
for$i=1,2,\dots ,n$,
and$T\left(\mathit{X}\right)={\sum}_{i=1}^{n}{[ln\left({X}_{i}\right)]}^{2}$.
Then,
is an unbiased estimator of$\alpha $.
Proof. First, remember that if
$X\sim \mathrm{Gamma}(a,b)$ distribution, then
$\mathbb{E}\left[{X}^{k}\right]=\frac{\mathsf{\Gamma}(a+b)}{{b}^{k}\mathsf{\Gamma}\left(a\right)}$. Since the
$\alpha $ parameter is observed to be squared, it will be necessary to apply it to find an unbiased estimator. So, considering the random variable
${W}_{n}^{1/2}$ (with
${W}_{n}$ as defined in Proposition 4), it follows that:
so,
□
Remark 3. Considering the two previous propositions and resorting to the LehmannScheffé theorem, one can conclude that $S\left(\mathit{X}\right)$ is UMVUE for α.
3.2. Estimation using the Maximum Likelihood Method
Let
$\mathit{x}=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$ be a realization of the random sample
$\mathit{X}=\left({X}_{1},{X}_{2},\dots ,{X}_{n}\right)$ taken from the
$\mathrm{AU}\left(\alpha \right)$ distribution. Then, the loglikelihood function is given by
The MLE of
$\alpha $, i.e.,
$\widehat{\alpha}$, is found by solving the following equation:
resulting
On the other hand, the second derivative of $\ell \left(\alpha \right)$ evaluated at $\alpha =\widehat{\alpha}$ is negative, therefore concluding that $\widehat{\alpha}$ is MLE for $\alpha $.
It is known that, under certain regularity conditions,
in which
$I\left(\alpha \right)=\mathbb{E}\left[\frac{{d}^{2}\ell \left(\alpha \right)}{d{\alpha}^{2}}\right]=\frac{6n}{{\alpha}^{2}}$.
A twosided
$100(1\pi )\%$ confidence interval for
$\alpha $ can be calculated by
in which
${z}_{q}$ is the
qth percentile of the standard normal distribution. The variance of
$\widehat{\alpha}$ can be approximated by the inverse of the observed Fisher information, as
Since
$\alpha $ is a positive value and we cannot guarantee that the lower limit of the interval (
6) is positive, we resort to the delta method to remedy such situation. For this, we define the function
$g:[0,\infty )\to \mathbb{R}$ as
$g\left(\alpha \right)=ln\left(\alpha \right)$, and knowing that
we can, then, obtain an approximate twosided
$100(1\pi )\%$ confidence interval for
$\alpha $ through
3.3. Simulation Study
In order to illustrate the presented inferences for the estimation of the AU distribution, the MLE versus the UMVUE are compared (via simulation study) in this subsection. Moreover, we considered the scenarios in which the parameter
$\alpha =\{0.1,0.3,0.5,0.7,1.1,1.5\}$, considering sample sizes
$n=\{100,200,500\}$, through the Monte Carlo method with
$N=1000$ repetitions. This entire procedure took into account the random number generator for the
$\mathrm{AU}\left(\alpha \right)$ distribution shown in Algorithm 1. All analyses carried out in this study adopted the opensource R software [
24].
For the performance comparison of the proposed estimators (MLE and UMVUE), since the true parameter value is known, the bias and mean squared error (MSE) metrics were adopted, and they are defined, respectively, as it follows:
in which
${\widehat{\alpha}}_{i}$ is the estimate for
$\alpha $ in the
ith iteration (point estimation). Additionally, based on the asymptotic results presented in this study, we also calculated the 95% confidence interval (CI) length by adopting the delta method from Equation (
7) (interval estimation). That is, it analyzed the average of all the upper limits of the 95% confidence interval, as well as the average of all the lower limits, and then calculated their difference.
Table 1 presents the obtained average estimates (AvE) of the
$\alpha $ parameter, for each sample size
n, as well as the corresponding bias, MSE and 95% CI length (this last one only for MLE) results.
The asymptotic convergence of the MLE towards the robustness is noticed as the sample size increases. In addition, both MLE and UMVUE’s bias and MSE are small and tend to decrease as n gets larger. On the other hand, the CI length also decreases as the sample size increases.
Finally, regarding the robustness of the estimators, the difference between the MLE and UMVUE estimates was taken, considering each different sample size n. Then, the interquartile range (IQR) was calculated per sample size group. That is, ${\mathrm{IQR}}^{\left(ni\right)}\left({\widehat{{\alpha}_{1}}}_{\mathrm{MLE}}^{\left(ni\right)}{\widehat{{\alpha}_{1}}}_{\mathrm{UMVUE}}^{\left(ni\right)},\dots ,\right.$$\left.{\widehat{{\alpha}_{j}}}_{\mathrm{MLE}}^{\left(ni\right)}{\widehat{{\alpha}_{j}}}_{\mathrm{UMVUE}}^{\left(ni\right)},\dots ,{\widehat{{\alpha}_{6}}}_{\mathrm{MLE}}^{\left(ni\right)}{\widehat{{\alpha}_{6}}}_{\mathrm{UMVUE}}^{\left(ni\right)}\right)$, in which $ni=\{100,200,500\}$ and ${\alpha}_{j}=\{{\alpha}_{1}=0.1,{\alpha}_{2}=0.3,\dots ,{\alpha}_{6}=1.5\}$. For instance, the IQR for $n=100$ was $0.00053$, whereas for $n=200$ and $n=500$, it went down to $0.00025$ and $0.00012$, respectively. This points out, in short, that as the sample size gets larger, the error range gets smaller, regardless of the value of the $\alpha $ parameter.
4. RealWorld Exemplifications
In this section, two applications adopting the AU distribution with realworld issues are exemplified. The first case is related to the dynamics of the Chilean inflation in the postmilitary dictatorship period. The second case pertains to the relative humidity of the air in the northern Chilean city of Copiapó (Atacama region).
The Chilean inflation data are recorded annually, whose values considered the range from 1992 to 2021. These are based on the period after the military dictatorship of 1973–1990. It was analyzed the dynamics of the inflation data (in %), which were standardized by minmax transformation, resulting in a unit response variable (value between zero and one). The years 1990 and 1991 were excluded, since they are considered to be a period of transition. Then, the total amount of observations was of 30 years (from 1992 to 2021).
On the other hand, the relative air humidity data cover the period from February 2015 to October 2022, with a onehour recording format (104,415 observations). Then, this data set was transformed into daily maximum observation (6226 observations).
4.1. Chilean Inflation (PostMilitary Era)
Figure 3 presents the dynamics of the Chilean inflation in the postmilitary dictatorship period, demonstrating stability between the years of 1999 and 2008. The right panel displays the time series of inflation, in which time is measured in years, from year 1 (1992) to year 30 (2021). The left panel depicts the accumulation of the values throughout the time series, in which a predominant trend is shown around 0.1 of the inflation rate.
Once the empirical dynamics of these data was analyzed, the most common unit distributions, presented in the statistical literature, were fitted. The upper panel of
Figure 4 illustrates the histogram for the inflation data, in which it is compared with different fitted densities based on the MLE: AU, beta (BE), Kumaraswamy (KUM), logitnormal (LOGITNO), simplex (SIMPLEX), unithalfnormal (UHN), and unitLindley (ULINDLEY). The lower panel of the same figure presents the fitted CDFs superimposed to the empirical CDF (ECDF).
In order to quantify the performance of the fitted models, the Akaike Information Criterion (AIC) [
25] and the Bayesian (or Schwarz) Information Criterion (BIC) [
26] were analyzed. The obtained results (see
Table 2) indicated the AU model as the bestfitted model to this data set. In addition, it is possible to make an inference about the average of the phenomenon, that is, the expectation of the AU(
$\widehat{\alpha}=1.2059$) model, resulting in
$\mathbb{E}\left[{X}_{\mathrm{Inflation}}\right]=0.1948$. In other words, the average Chilean inflation, in postmilitary era, is of 19.49%.
In the following subsection, it is illustrated the performance of the AU model when adopting a highfrequency data set originated from the relative humidity from a city located in the Atacama Desert.
4.2. Water Monitoring in Air Humidity
The hydrological regime of the main rivers of Atacama is characterized by ice sources: water flows from the peaks following the melting of snowfall, glaciers, and permafrost located in the upper parts of the Andes range. In the context of climate change, it is, therefore, essential to understand the hydrological cycle of these regions, in order to set up a sustainable management policy to them. Understanding the hydrological cycle requires the implementation of tools for forecasting river flows, relative humidity, groundwater reservoirs, or any other waterrelated quantity monitoring, which inevitably demands an indepth knowledge with respect to the physical phenomena that rule the entire hydrological cycle and, more precisely, the complex interaction between atmosphere, climate, landforms, ice, snow and river flows.
Additionally, a unique phenomenon called
Camanchaca happens, which consists in a fog passing by the Copiapó city, recurrent only between midnight to around 10 a.m. Here, we demonstrate the variation of the relative humidity of Copiapó city, proposing a methodology that can be efficient, adjustable to these data. Using the daily maximum relative humidity, six different unit distributions were compared: AU, BE, KUM, LOGITNO, SIMPLEX, and UHN, as shown in
Figure 5.
After comparing the commonly used unit models, we demonstrate the advantage of fitting the AU model over others (visually).
Table 3 confirms the best fit of the AU model, based on information criteria (AIC and BIC), as well as depicts the estimation of the parameter(s) of each model.
After obtaining the parameter estimate for
$\alpha $, the AU model (bestfitted model) was used to construct a Statistical Process Control (SPC) chart [
27], by calculating a tolerance upperlower bound. Moreover, the Highest Density Interval (HDI) was adopted, considering a confidence degree of 99%, to monitor the daily maximum relative humidity records (as displayed by
Figure 6).
The expected daily maximum water relative humidity is of 84.23% (based on the fitted AU model). The obtained control limits, considering a confidence (or tolerance) level of 99%, were: $\mathrm{LCL}=68.56\%$ and $\mathrm{UCL}=97.73\%$. Thus, the control chart based on the AU model (AU control chart) is another exciting and valuable alternative to some wellknown SPC tools, which enlightens the forecasting and opens new doors to discuss extreme events in the Atacama water particles monitored by probabilistic reasoning.
5. Conclusions
This study showed the competitiveness of the developed Theorem 1 (Equation (
1)), which enables for a great class of distributions that belong all to the exponential family. As an exemplification, we adopted the special case for
$k=1$, which is equivalent to the moment of order two of the standard normal distribution, and after some transformations, we developed the AlphaUnit (AU) distribution. Also, we dedicated to the unit range, given the importance of this stochasticity representation.
Unit distributions are useful for values that oscillate between zero and one, such as fractions, proportions and rates, among others, or for a set of values in which there is a minimum or maximum limitation, resorting to standardization through the minmax transformation. Most distributions of this type come from transforming a random variable with certain distribution so that it takes values between zero and one, as in the case of unitLindley distribution [
9], which comes from the Lindley distribution [
28,
29].
There are numerous studies based on (e.g., unit) distributions, by extending a model and applying it to several areas [
11,
14,
16]. In this study, we introduced and showed the competitiveness of the AU distribution, especially for data with a range greater than 0.4, or which present high asymmetry and low decay. Further studies shall investigate this hypothesis in a wider amount of data sets (through different sorts of wide data range). Additionally, an implementation of this model adopting hierarchical estimation and spatiotemporal dependence would be useful for forecast/predictable problems.