Statistical Advancement of a Flexible Unitary Distribution and Its Applications

Hugo S. Salinas; Hassan S. Bakouch; Fatimah E. Almuhayfith; Wilson E. Caimanque; Leonardo Barrios-Blanco; Olayan Albalawi

doi:10.3390/axioms13060397

,

and

¹

Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1531772, Chile

²

Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia

³

Department of Mathematics, Faculty of Science, Tanta University, Tanta 31111, Egypt

⁴

Department of Mathematics and Statistics, College of Science, King Faisal University, Alahsa 31982, Saudi Arabia

Axioms2024, 13(6), 397;https://doi.org/10.3390/axioms13060397

This article belongs to the Special Issue Stochastic and Statistical Analysis in Natural Sciences

Version Notes

Order Reprints

Abstract

A flexible distribution has been introduced to handle random variables in the unit interval. This distribution is based on an exponential transformation of the truncated positive normal distribution with two parameters and can effectively fit data with varying degrees of skewness and kurtosis. Therefore, it presents an alternative for modeling this type of data. Several mathematical and statistical properties of this distribution have been derived, such as moments, hazard function, the Bonferroni curve, and entropy. Moreover, we investigate the characterizations of the proposed distribution based on its hazard function. Parameter estimation has been performed using both the maximum likelihood method and method of the moments. Because of this, we were able to determine the best critical region and the information matrix, facilitating the calculation of asymptotic confidence intervals. A simulation study is presented to analyze the behavior of the obtained estimators for different sample sizes. To demonstrate the suitability of the proposed distribution, applications and goodness-of-fit tests have been performed on two practical data sets.

Keywords:

entropy; information matrix; estimation; statistical modeling; proportion data; simulation; unit interval

MSC:

60E05; 62E15; 62F10

1. Introduction

Data fitting is an essential procedure in statistical analysis to ensure the precision and dependability of outcomes. Recently, there has been a growing interest in developing innovative models for data fitting, particularly within the unit interval. This statistical approach aims to address the challenges associated with manipulating data within a specific range and offers a more robust framework for analysis. In this manuscript, we present a groundbreaking model for data fitting within the unit interval, which exhibits promising potential for enhancing the quality of statistical inferences.

The proposed distribution is particularly useful for modeling data within the unit interval

[0, 1]

, making it relevant in various areas of life. For instance, in finance, this distribution can be used to model the probability of investment returns, especially in assets with variable risk or volume. In environmental science, it can be applied to model proportion data, such as species coverage in an ecosystem. Additionally, in public health, this distribution can be utilized to analyze data on the incidence or prevalence rates of diseases in a population.

The proposed model is based on the transformation of a random variable that adheres to a truncated positive normal distribution [1]. Utilizing a truncated positive normal (TPN) distribution as the foundation of the suggested model offers several advantages. Firstly, this distribution allows for greater flexibility in data modeling and analysis compared to the conventional semi-normal distribution. Secondly, the proposed model incorporates an additional shape parameter that further enhances its flexibility and enables a more optimal fit. Among the models associated with the normal distribution, the half-normal (HN) model is notable. It emerges as a specific instance within the context of the TPN, where the shape parameter equals zero. This property renders the TPN a more adaptable model in contrast to the HN.

The objective of this study is to propose a flexible distribution for modeling real-world data with support in the interval (0, 1) using the exponential transformation

X = \exp {- Z}

, where Z follows a TPN distribution. Models to which this transformation is applied are called unitary distributions. Various alternative transformations can be employed to achieve a unitary distribution, such as

Z / (1 + Z)

,

{(1 + Z)}^{- 1}

,

\sin^{2} (Z)

, and

\cos^{2} (Z)

, where

Z > 0

. In the current study, the proposed transformation yields closed-form expressions and a simple structure for the X distribution. Furthermore, this transformation provides statistical properties related to the simple closed-form distribution. For instance, the moment-generating function remains closed, unlike with other transformations.

This range encompasses well-established models, such as the beta and Kumaraswamy distributions. Additionally, the literature presents various unitary distribution models, including the Topp–Leone distribution [2]: unit-gamma [3], log-Lindley [4], two-parameter unit-logistic [5], two-parameter unit-Birnbaum–Saunders [6], unit-Weibull [7], unit-Gompertz [8], unit-inverse Gaussian [9], unit modified Burr-III [10], one-parameter unit-Lindley [11], alpha-unit [12], modified unit-half-normal [13], unit-half-normal [14], unit-exponential [15] and unit upper truncated Weibull [16].

The proposed model provides a somewhat better fit compared to these models. Another crucial feature of the model is its ability to adequately fit small data sets that exhibit extreme right skewness within the unit interval.

The proposed distribution encompasses the behavior and provides superior fits compared to some established lifetime distributions, such as the unit-logistic, beta, and Kumaraswamy distributions. The rationale for introducing the unit-truncated positive normal distribution is based on (i) its ability to model constant, increasing, or inverted risk rates, allowing it to capture different behaviors over time; (ii) suitabilitys for fitting data that are skewed and may not adequately fit other common distributions and its applicability to a variety of problems in diverse fields, such as environmental studies, industrial reliability, and survivability analysis; and (iii) its favorable comparison to three alternative life distributions for testing failure and environmental data based on two practical data applications.

The following summary provides an overview of the remaining sections of the paper. The proposed distribution is presented in Section 2 along with a discussion of its basic characteristics. In Section 3, the expected Fisher information matrix and estimators of the unknown parameters by the maximum likelihood (ML) technique are presented. Section 4 conducts Monte Carlo simulations to assess the effectiveness of the ML estimators and the parameters’ asymptotic confidence intervals. Two sets of real-world data are analyzed and presented in Section 5. The paper is finally concluded in Section 6.

2. The Model and Its Properties

In this section, we outline the proposed bounded distribution based on transforming a truncated positive normal variable, as described in [1].

A random variable Z follows the TPN, denoted as

Z \sim T P N (σ, α)

, if its cumulative distribution function (CDF) is given by

F_{Z} (z; α, σ) = c_{α} [Φ (\frac{z}{σ} - α) + Φ (α) - 1], 0 < z < \infty,

(1)

where

c_{α}^{- 1} = Φ (α)

,

α \in R

is a shape parameter,

σ > 0

is a scale parameter, and

Φ

represents the CDF of the standard normal (SN) distribution.

Let Z be a non-negative random variable following the TPN distribution. Its probability density function (PDF) is expressed as

f_{Z} (z; α, σ) = \frac{c_{α}}{σ} ϕ (\frac{z}{σ} - α), 0 < z < \infty,

(2)

where

ϕ

represents the PDF of the SN distribution.

The quantile function (the inverse of CDF given in (1)) of the variable Z is given by

Q_{Z} (p; α, σ) = σ [Φ^{- 1} (\frac{c_{α}}{p} - Φ (α) + 1) + α],

where

Φ^{- 1}

is the SN distribution’s quantile function.

Proposition 1.

Using the transformation

X = \exp {- Z}

, we derive a new distribution with support on the interval

(0, 1)

, known as the unit-TPN distribution, denoted as

X \sim U T P N (α, σ)

. Its PDF is defined as

f_{X} (x; α, σ) = \frac{c_{α}}{x σ} ϕ (\frac{\log (x)}{σ} + α), 0 < x < 1,

(3)

where

α \in R

and

σ > 0

are the parameters for shape and scale, respectively.

Proof.

Using the change-of-variable approach and the symmetry of the SN (

ϕ (x) = ϕ (- x)

), then the random variable

X = \exp {- Z}

takes values within the interval

(0, 1)

and has the density function

\begin{matrix} f_{X} (x; α, σ) & = & f_{Z} (- \log (x); σ, α) |\frac{d}{d x} (- \log (x))| \\ = & \frac{c_{α}}{σ} ϕ (- \frac{\log (x)}{σ} - α) |- \frac{1}{x}| \\ = & \frac{c_{α}}{x σ} ϕ (\frac{\log (x)}{σ} + α) . \end{matrix}

□

The associated CDF and hazard rate (HR) functions for Equation (3) are provided, respectively, by

F_{X} (x; α, σ) = c_{α} Φ (\frac{\log (x)}{σ} + α),

(4)

h_{X} (x; α, σ) = \frac{c_{α}}{x σ} \frac{ϕ (\frac{\log (x)}{σ} + α)}{1 - c_{α} Φ (\frac{\log (x)}{σ} + α)} .

(5)

2.1. Characterizations of the UTPN Distribution Based on Its Hazard Function

Characterizing a PDF using the hazard function is essential for grasping temporal event dynamics. This linkage offers a valuable understanding of how event probabilities evolve over time and with pertinent factors. Particularly in survival analysis, it aids in predicting survival probabilities, while in reliability engineering, it assists in evaluating failure rates and directing maintenance strategies. In essence, this characterization proves to be a potent instrument for analyzing time-to-event data and facilitating informed decision-making across diverse domains. Various researchers, including Glänzel [17,18] and Hamedani [19], have delved into different techniques for such characterizations of continuous probability distributions.

According to Akhila et al. [20], the PDF and the hazard function are related in the following manner:

\frac{f^{'} (x)}{f (x)} = \frac{h^{'} (x)}{h (x)} - h (x),

(6)

where

f (x)

is the PDF and

h (x)

is the hazard function.

For the following result, we will redefine the two earlier functions of the UTPN distribution as follows: Let

f (x)

represent

f_{X} (x; α, σ)

and

h (x)

represent

h_{X} (x; α, σ)

.

Theorem 1.

Let

X : Ω \to (0, 1)

be a continuous random variable. Equation (3) provides the PDF of X if and only if the next differential equation is satisfied by its hazard function,

h (x)

:

x^{\frac{α}{σ} + 1} h^{'} (x) + (\frac{α}{σ} + 1) x^{\frac{α}{σ}} h (x) = \frac{c_{α}}{σ} \frac{d}{d x} (\frac{x^{\frac{α}{σ}} ϕ (\frac{\log (x)}{σ} + α)}{1 - c_{α} Φ (\frac{\log (x)}{σ} + α)}) .

(7)

Proof.

The PDF

f (x)

and the hazard function

h (x)

of X are given by Equations (3) and (5), respectively. Then, we have

\frac{f^{'} (x)}{f (x)} = - \frac{1}{x} - \frac{1}{x σ} (\frac{\log (x)}{σ} + α) .

Utilizing Equation (6), we can express

h^{'} (x) + \frac{\frac{α}{σ} + 1}{x} h (x) = \frac{c_{α}}{x^{2} σ^{2}} \frac{ϕ (\frac{\log (x)}{σ} + α) (c_{α} ϕ (\frac{\log (x)}{σ} + α) - \frac{\log (x)}{σ} (1 - c_{α} Φ (\frac{\log (x)}{σ} + α)))}{{(1 - c_{α} Φ (\frac{\log (x)}{σ} + α))}^{2}},

which implies

x^{\frac{α}{σ} + 1} h^{'} (x) + (\frac{α}{σ} + 1) x^{\frac{α}{σ}} h (x) = \frac{c_{α}}{σ^{2}} \frac{x^{\frac{α}{σ} - 1} ϕ (\frac{\log (x)}{σ} + α) (c_{α} ϕ (\frac{\log (x)}{σ} + α) - \frac{\log (x)}{σ} (1 - c_{α} Φ (\frac{\log (x)}{σ} + α)))}{{(1 - c_{α} Φ (\frac{\log (x)}{σ} + α))}^{2}} .

Now, given that Equation (7) holds,

\frac{d}{d x} (x^{\frac{α}{σ} + 1} h (x)) = \frac{c_{α}}{σ} \frac{d}{d x} (\frac{x^{\frac{α}{σ}} ϕ (\frac{\log (x)}{σ} + α)}{1 - c_{α} Φ (\frac{\log (x)}{σ} + α)}),

from which we derive

h (x) = \frac{c_{α}}{x σ} \frac{ϕ (\frac{\log (x)}{σ} + α)}{1 - c_{α} Φ (\frac{\log (x)}{σ} + α)} .

□

2.2. Shapes

The UTPN distribution’s PDF is unimodal and log-concave. Indeed, the second derivative of

\log (f_{X} (x; α, σ))

is

\frac{d^{2}}{d x^{2}} \log (f_{X} (x; α, σ)) = \frac{\log (x) - 1 + σ (α + σ)}{x^{2} σ^{2}} < 0 .

(8)

Figure 1 and Figure 2 present the different curves for the PDF and the HR function, respectively, of the UTPN distribution for varying values of

α

and

σ

. Figure 1 reveals that there are four possible behaviors for the UTPN distribution: increasing, unimodal, reversed J-shaped, and right-skewed. Figure 2 demonstrates that the HR function of the UTPN distribution can have a bathtub-inverted shape, be increasing, or remain constant. An advantage of the UTPN distribution over the TPN distribution is that the latter is unable to describe situations with an inverted bathtub-shaped hazard function.

Figure 1. Graph of the UTPN densities for various values of

α

and

σ

.

Figure 2. Graphs of the HR function for the UTPN distribution with varying values of

α

and

σ

.

2.3. Quantile Function

The quantile function of the UTPN distribution is derived by inverting Equation (4) as follows:

Q_{p} = {(\exp \{Φ^{- 1} (\frac{p}{c_{α}}) - α\})}^{σ}, 0 < p < 1 .

(9)

Note that

Q_{0.5}

,

Q_{0.25}

, and

Q_{0.75}

stand for median, first quartile, and third quartile of the UTPN distribution, correspondingly.

2.4. Mode

The mode of

f_{X} (x; α, σ)

is the root of the equation

\frac{d}{d x} \log (f_{X} (x; α, σ)) = - \frac{1}{x} - \frac{1}{x σ} (\frac{\log (x)}{σ} + α) = 0 .

(10)

Therefore, if

x = x_{0}

,

x_{0} = \exp \{- σ^{2} (1 + \frac{α}{σ})\},

(11)

this means that

x_{0}

is the sole point where

f_{X} (x; α, σ)

reaches its maximum.

2.5. Hazard Rate Function

Lemma 1.

Given that

f (x)

, for

x > 0

, is a twice-differentiable density function of a positive real-valued continuous random variable with an HR function

h (x)

, let

λ (x) = - (d / d x) \log (f (x))

. Consequently, (i) if

λ (x)

decreases (increases) as x increases, then

h (x)

increases (decreases) as x increases, and (ii) if

λ (x)

follows a bathtub (inverted bathtub) pattern, then

h (x)

will also follow a bathtub (inverted bathtub) pattern.

The proof of this result is provided by Glaser [21]. Based on this finding, the shape of the HR function of the UTPN distribution can be inferred as follows.

Proposition 2.

The HR function of the UTPN distribution exhibits an inverted bathtub shape.

Proof.

Given that

λ (x) = \frac{σ (α + σ) + \log (x)}{x σ^{2}},

(12)

it follows that

\frac{d}{d x} λ (x) = \frac{- σ (α + σ) - \log (x) + 1}{x^{2} σ^{2}} .

(13)

Consequently, by

λ^{'} (x) = 0

, the global maximum of

λ (x)

is

x_{*} = \exp {1 - σ (α + σ)}

, because

{\frac{d^{2}}{d x^{2}} λ (x)|}_{x = x_{*}} = \frac{- 1}{σ^{2} \exp {3 - 3 σ (α + σ)}} < 0 .

(14)

This demonstrates that

λ (x)

exhibits an inverted bathtub shape. Therefore, according to Glaser’s Lemma,

h (x)

also exhibits an inverted bathtub shape. Additionally, with

(d / d x) λ (x) > 0

for

x \in (0, 1)

,

σ > 0

, and

α < - σ

, it follows that

λ (x)

is an increasing function. □

2.6. Moments and Moment Generating Function

Here, we derive the expressions for the moments and moment-generating function of the distribution, which are crucial for any statistical analysis, particularly in applied research. The distribution’s moments, including its mean, variance, skewness, and kurtosis, provide insight into its most significant characteristics.

Proposition 3.

If the random variable X follows a UTPN distribution, its r-th moment about zero can be calculated as

E (X^{r}) = \frac{Φ (α - r σ)}{Φ (α)} \exp \{σ (\frac{r^{2} σ}{2} - α)\} .

(15)

Proof.

Using the stochastic representation

X = \exp {- Z}

, where

Z \sim T P N (σ, α)

, one can write

E (X^{r}) = E (e^{- r Z}) = \int_{0}^{\infty} \frac{c_{α}}{σ} e^{- r z} ϕ (\frac{z}{σ} - α) d z

and defining

u = z / σ - α

implies

E (X^{r}) = \int_{- α}^{\infty} c_{α} e^{- r σ (u + α)} ϕ (u) d u = \frac{Φ (α - r σ)}{Φ (α)} \exp \{σ (\frac{r^{2} σ}{2} - α)\} .

This is accomplished using some simplifications and minor algebraic manipulation. □

Corollary 1.

(i) Let

X \sim U T P N (α, σ)

; then, the mean, variance, skewness (

γ_{X}

), and kurtosis (

κ_{X}

) coefficients are, respectively, given as

E (X) = ρ_{1} e^{σ (\frac{σ}{2} - α)}, V a r (X) = e^{σ (σ - 2 α)} (ρ_{2} e^{σ^{2}} - ρ_{1}^{2}),

γ_{X} = \frac{ρ_{3} e^{3 σ^{2}} - 3 ρ_{1} ρ_{2} e^{σ^{2}} + 2 ρ_{1}^{3}}{{(ρ_{2} e^{σ^{2}} - ρ_{1}^{2})}^{3 / 2}} a n d κ_{X} = \frac{ρ_{4} e^{6 σ^{2}} - 4 ρ_{1} ρ_{3} e^{3 σ^{2}} + 6 ρ_{1}^{2} ρ_{2} e^{σ^{2}} - 3 ρ_{1}^{4}}{{(ρ_{2} e^{σ^{2}} - ρ_{1}^{2})}^{2}},

(ii) As a result of the central limit theorem, let

X_{1}, X_{2}, \dots

be independent random variables, and utilizing the identical distribution of

X \sim U T P N (α, σ)

, then, if

S_{n} = X_{1} + X_{2} + \dots + X_{n}

, one has

\frac{S_{n} - n μ}{ω \sqrt{n}} \overset{D}{\to} N (0, 1) (n ↑ \infty),

where

μ = ρ_{1} \exp {σ (σ / 2 - α)}, ω = \sqrt{\exp {σ (σ - 2 α)} (ρ_{2} \exp {σ^{2}} - ρ_{1}^{2})},

ρ_{r} : = ρ_{r} (σ, α) = \frac{Φ (α - r σ)}{Φ (α)} and = r = 1, 2, 3, 4 .

In Figure 3, the UTPN distribution’s mean, variance, skewness, and kurtosis are displayed as functions of

α

and

σ

. Observations indicate that as

α

varies, the mean decreases independently of the values of

σ

. In terms of variance, the curves exhibit concavity and unimodality for all

α

values, with values decreasing as

σ

decreases. On the other hand, negative skewness is observed for

α < 0

, while positive skewness is observed for

α > 0

. In addition, smaller

σ

values are associated with reduced skewness, while larger

σ

values are associated with increased skewness. Finally, kurtosis decreases as

σ

decreases, which occurs for

α > 2

(approximately).

Figure 3. Graph of the UTPN distribution’s mean, variance, skewness, and kurtosis for various values of

α

and

σ

.

Proposition 4.

If the random variable X is UTPN distributed, then its moment-generating function is given by

M_{X} (t) = E (e^{t X}) = \sum_{k = 0}^{\infty} \frac{t^{k}}{k!} ρ_{k} (σ, α) \exp \{k σ (\frac{k σ}{2} - α)\} .

(16)

Proof.

The moment-generating function’s definition implies

M_{X} (t) = c_{α} \int_{0}^{1} \frac{e^{t x}}{x σ} ϕ (\frac{\log (x)}{σ} + α) d x,

using the exponential series

\exp {a} = \sum_{k = 0}^{\infty} a^{k} / k!

and taking the change of variables

u = \log (x) / σ + α

\begin{matrix} M_{X} (t) & = & c_{α} \int_{- \infty}^{α} \sum_{k = 0}^{\infty} \frac{t^{k} e^{k (u - α) σ}}{k!} ϕ (u) d u \\ = & c_{α} \sum_{k = 0}^{\infty} \frac{t^{k} e^{- k α σ}}{k!} \int_{- \infty}^{α} e^{k σ u} ϕ (u) d u \\ = & \sum_{k = 0}^{\infty} \frac{t^{k} e^{k σ (\frac{k σ}{2} - α)}}{k!} \frac{Φ (α - k σ)}{Φ (α)} \\ = & \sum_{k = 0}^{\infty} \frac{t^{k}}{k!} ρ_{k} (σ, α) \exp \{k σ (\frac{k σ}{2} - α)\} . \end{matrix}

□

2.7. Curves of Bonferroni and Lorenz

Bonferroni and Lorenz curves [22] are commonly used in economics to study income and poverty, although they are also helpful in other domains, including reliability, insurance, demography, and medicine. The definition of the Bonferroni curve is

B (p) = \frac{1}{p μ} \int_{0}^{q} x f_{X} (x) d x, 0 \leq p < 1,

(17)

where

μ = E (X)

and

q = F_{X}^{- 1} (p; α, σ)

. The Lorenz curve is obtained by the expression

L (p) = p B (p)

. Specifically, the Bonferroni curve for the UTPN distribution can be calculated as

B (p) = \frac{c_{α}}{p μ} Φ (\frac{\log (q)}{σ} + α - σ) \exp \{σ (\frac{σ}{2} - α)\} .

Figure 4 illustrates the Bonferroni curve for the UTPN distribution with

σ = 1

, showing different values for

α

. It is clear that the Bonferroni value increases as

α

decreases. Additionally, the graph indicates that as the probability p increases, the Bonferroni value also increases.

Figure 4. Bonferroni curve for the UTPN distribution for

α = - 1.5, - 0.5, 0, 0.5, 1.5

and

σ = 1

.

2.8. Entropy

A measure of the uncertainty’s variation is the entropy of a random variable X with a certain PDF. Greater data uncertainty is indicated by a high entropy value. The Rényi entropy [23],

R_{λ} (X)

, for X is defined as

R_{λ} (X) = \frac{1}{1 - λ} \log \{\int_{R} f_{X}^{λ} (x) d x\},

(18)

where

λ > 0

and

λ \neq 1

. Suppose X has the UTPN distribution, then by substituting (3) in (18), we obtain

\begin{matrix} \int_{R} f_{X}^{λ} (x) d x & = & \int_{0}^{1} \frac{c_{α}^{λ}}{σ^{λ} x^{λ}} ϕ^{λ} (\frac{\log (x)}{σ} + α) d x \\ = & \frac{c_{α}^{λ}}{σ^{λ - 1}} \int_{- \infty}^{α} e^{σ (u - α) (1 - λ)} ϕ^{λ} (u) d u \\ = & c_{α}^{λ} σ^{1 - λ} λ^{- 1 / 2} {(2 π)}^{(1 - λ) / 2} Φ (\frac{α λ - (1 - λ) σ}{\sqrt{λ}}) \exp \{\frac{{(1 - λ)}^{2} σ^{2}}{2 λ} - α (1 - λ) σ\} . \end{matrix}

So one obtains the Rényi entropy as follows:

R_{λ} (X) = - α σ + \log (σ \sqrt{2 π}) + \frac{(1 - λ) σ^{2}}{2 λ} - \frac{\log (λ)}{2 (1 - λ)} + \frac{1}{1 - λ} \log Φ (\frac{α λ - (1 - λ) σ}{\sqrt{λ}}) - \frac{λ}{1 - λ} \log Φ (α) .

Shannon entropy [24] defined by

S_{λ} (X) = E {- \log (f_{X} (x))}

is the particular case of Equation (18) when

λ \to 1

. Then calculating the

{lim}_{λ \to 1} R_{λ} (X)

and using L’Hospital’s rule, after some algebraic work, one obtains the result that

S_{λ} (X) = \log (σ \sqrt{2 π} Φ (α) \exp \{α σ - \frac{1}{2}\}) - \frac{c_{α}}{2} (α + 2 σ) ϕ (α) .

3. Estimation and Inference

In this section, we use the maximum likelihood and moments approaches to estimate the distribution parameters with the related inference.

3.1. Moments Estimator

Assume that the collection of realizations

x_{1}, \dots, x_{n}

comes from a size-n random sample selected from the UTPN distribution with parameters

α \in R

and

σ > 0

. For the moments estimation, let

m_{1} = (1 / n) \sum_{i = 1}^{n} x_{i}

,

m_{2} = (1 / n) \sum_{i = 1}^{n} x_{i}^{2}

,

ρ_{1} = Φ (α - σ) / Φ (α)

and

ρ_{2} = Φ (α - 2 σ) / Φ (α)

. By equating the theoretical moments obtained by Corollary 1 with the sample moments given above, one obtains the following equations:

\begin{matrix} σ (\frac{σ}{2} - α) & = & \log (\frac{m_{1}}{ρ_{1}}) \\ 2 σ (σ - α) & = & \log (\frac{m_{2}}{ρ_{2}}) . \end{matrix}

The moment estimators for

α

and

σ

are calculated by simultaneously solving these equations with an appropriate numerical method.

3.2. Maximum Likelihood Estimator

Here, we derive the maximum likelihood estimators (MLEs) and the observed Fisher information matrix for complete samples of the UTPN distribution. Let

x = (x_{1}, \dots, x_{n})

denote the observed values obtained from a size-n random sample selected from the UTPN distribution with parameters

α \in R

and

σ > 0

. We assume that both

α

and

σ

are unknown and aim to estimate them based on

x

. In this context, we adopt the maximum likelihood approach. The likelihood function for

α

and

σ

, given

x

, is expressed as

L (α, σ) = \frac{c_{α}^{n}}{σ^{n}} \prod_{i = 1}^{n} x_{i}^{- 1} ϕ (\frac{\log (x_{i})}{σ} + α) .

Hence, the following is an expression for the log-likelihood function:

ℓ (α, σ) = n \log (c_{α}) - n \log (σ) + \sum_{i = 1}^{n} \log \{ϕ (\frac{\log (x_{i})}{σ} + α)\} - \sum_{i = 1}^{n} \log (x_{i}) .

The function

l (α, σ)

is well defined for all values of the model parameters. It is continuous and concave with respect to the parameters, ensuring the existence of a unique global maximum. Additionally, the parameter space is bounded, which contributes to the uniqueness of the MLE. These properties guarantee that the MLE is unique and exists for the proposed distribution (Casella and Berger [25]). Therefore, in accordance with the conditions described above, the MLEs of

α

and

σ

are defined by

(\hat{α}, \hat{σ}) = {a r g m a x}_{(α, σ) \in R \times R^{+}} ℓ (α, σ) .

That is,

\hat{α}

and

\hat{σ}

fulfill the score equations linked to

{\partial ℓ (α, σ) / \partial α|}_{α = \hat{α}, σ = \hat{σ}} = 0

,

{- n (α + c_{α} ϕ (α)) - \frac{1}{σ} \sum_{i = 1}^{n} \log (x_{i})|}_{α = \hat{α}, σ = \hat{σ}} = 0,

(19)

and

{\partial ℓ (α, σ) / \partial σ|}_{α = \hat{α}, σ = \hat{σ}} = 0

implies

{- \frac{n}{σ} + \frac{α}{σ^{2}} \sum_{i = 1}^{n} \log (x_{i}) + \frac{1}{σ^{3}} \sum_{i = 1}^{n} \log^{2} (x_{i})|}_{α = \hat{α}, σ = \hat{σ}} = 0 .

(20)

We focus on the existence and uniqueness of each MLE of the distribution parameters in the following theorems, assuming that the other parameter is known; for additional details on this method, see Popović et al. [26] and Alomair et al. [27].

Proposition 5.

Given (19), there exists a unique solution to the equation

\partial ℓ (α, σ) / \partial α = 0

for

α \in R

.

Proof.

We have

\frac{\partial ℓ (α, σ)}{\partial α} = - n (\frac{ϕ (α)}{Φ (α)} + α) - \frac{1}{σ} \sum_{i = 1}^{n} \log (x_{i}) .

We know that

ϕ (α) \to 0

as

α \to \pm \infty

,

Φ (α) \to 1

as

α \to \infty

, and

Φ (α) \to 0

as

α \to - \infty

. Thus,

- n (ϕ (α) / Φ (α) + α) \to 0

as

α \to - \infty

and

- n (ϕ (α) / Φ (α) + α) \to - \infty

as

α \to \infty

. On the other hand, it follows that

{lim}_{α \to - \infty} \partial ℓ (α, σ) / \partial α = - (1 / σ) \sum_{i = 1}^{n} \log (x_{i}) > 0

(because

σ > 0

and

\log (x_{i}) < 0, \forall i = 1, \dots, n

) and

{lim}_{α \to \infty} \partial ℓ (α, σ) / \partial α = - \infty

. Therefore, there exists at least one root, say

\hat{α} \in (- \infty, \infty)

, such that

{\partial ℓ (α, σ) / \partial α|}_{α = \hat{α}, σ} = 0

. To prove uniqueness, we need to verify that

\partial^{2} ℓ (α, σ) / \partial α^{2} < 0

. Indeed,

- n < \partial^{2} ℓ (α, σ) / \partial α^{2} < 0

for all

n \geq 1

, because

lim_{α \to - \infty} \frac{\partial^{2} ℓ (α, σ)}{\partial α^{2}} = lim_{α \to - \infty} - n (1 - {(\frac{ϕ (α)}{Φ (α)})}^{2} - α \frac{ϕ (α)}{Φ (α)}) = 0

and

lim_{α \to \infty} \frac{\partial^{2} ℓ (α, σ)}{\partial α^{2}} = lim_{α \to \infty} - n (1 - {(\frac{ϕ (α)}{Φ (α)})}^{2} - α \frac{ϕ (α)}{Φ (α)}) = - n < 0 .

Therefore, there exists a solution to

\partial ℓ (α, σ) / \partial α = 0

and the root

\hat{α}

is unique. □

Proposition 6.

Given (20), there exists a solution for equation

\partial ℓ (α, σ) / \partial σ = 0

for

σ > 0

, and the solution is unique when

n σ^{2} < S (α, σ)

, where

S (α, σ) = 2 α σ \sum_{i = 1}^{n} \log (x_{i}) + 3 \sum_{i = 1}^{n} \log^{2} (x_{i}) .

Proof.

Solving

\partial ℓ (α, σ) / \partial σ = 0

is equivalent to solving

h (α, σ) = 0

, where

h (α, σ) = - n σ^{2} + α σ \sum_{i = 1}^{n} \log (x_{i}) + \sum_{i = 1}^{n} \log^{2} (x_{i}) .

For known

α

, and knowing that

\log^{2} (x_{i}) > 0

for all

i = 1, \dots, n

, we have the limit values of

h (α, σ)

as follows

lim_{σ \to 0} h (α, σ) = \sum_{i = 1}^{n} \log^{2} (x_{i}) > 0 and lim_{σ \to \infty} h (α, σ) = - \infty .

Thus we can ensure that there is at least one root, say

\hat{σ} \in (0, \infty)

, such that

h (α, \hat{σ}) = 0

. To prove uniqueness, we have to show that

\partial^{2} ℓ (α, σ) / \partial σ^{2} < 0

; that is,

\frac{n}{σ^{2}} - \frac{2 α}{σ^{3}} \sum_{i = 1}^{n} \log (x_{i}) - \frac{3}{σ^{4}} \sum_{i = 1}^{n} \log^{2} (x_{i}) < 0,

which implies

\begin{matrix} \frac{n}{σ^{2}} & < & \frac{2 α}{σ^{3}} \sum_{i = 1}^{n} \log (x_{i}) + \frac{3}{σ^{4}} \sum_{i = 1}^{n} \log^{2} (x_{i}), \\ n σ^{2} & < & 2 α σ \sum_{i = 1}^{n} \log (x_{i}) + 3 \sum_{i = 1}^{n} \log^{2} (x_{i}) . \end{matrix}

Hence, we find that

n σ^{2} < S (α, σ)

. Therefore, there exists a solution to

\partial ℓ (α, σ) / \partial σ = 0

, and the root

\hat{σ}

is unique when

n σ^{2} < S (α, σ)

. □

From (19), it follows immediately that the MLE of

σ

can be obtained as

\hat{σ} = - \frac{\bar{\log (x)}}{c_{\hat{α}} ϕ (\hat{α}) + \hat{α}} .

(21)

Using (20) and (21), the MLE of

α

satisfies the following equation

\frac{1}{c_{\hat{α}} ϕ (\hat{α}) + \hat{α}} (\frac{1}{c_{\hat{α}} ϕ (\hat{α}) + \hat{α}} + \hat{α}) = \frac{\bar{\log^{2} (x)}}{{(\bar{\log (x)})}^{2}} .

where

\bar{\log (x)} = (1 / n) \sum_{i = 1}^{n} \log (x_{i})

and

\bar{\log^{2} (x)} = (1 / n) \sum_{i = 1}^{n} \log^{2} (x_{i})

.

The numerical values of

\hat{α}

and

\hat{σ}

can be determined via any statistical software. Theoretical results guarantee the convergence of the MLEs in any sense, as well as the desired asymptotic normality.

The proof of the MLEs’ asymptotic normality is provided in the following proposition.

Proposition 7.

Let

θ = (α, σ)

and suppose that the regularity conditions (Casella and Berger [25]) hold for

f_{X} (x; θ)

such that

\partial^{3} ℓ (θ) / \partial θ^{3}

exists and its absolute value is bounded by a function

K (x)

such that

E (K (X)) \leq k

.

Let

{\hat{θ}}_{n}

be a consistent sequence of roots of

S (θ) = \partial ℓ (θ) / \partial θ

, i.e.,

\hat{θ} \overset{P}{⟶} θ_{0}

, where

θ_{0}

is the true value of the parameter. Then,

\sqrt{n} (\hat{θ} - θ_{0}) \overset{D}{⟶} N (0, I^{- 1} (θ_{0})),

where

I (θ_{0}) = {E (- \frac{\partial^{2} ℓ (θ)}{\partial θ^{2}})|}_{θ = θ_{0}} .

Proof.

We perform a second-order Taylor expansion of the score function

S (θ)

around

θ_{0}

and evaluate it at

\hat{θ}

:

0 = S (\hat{θ}) = S (θ_{0}) + (\hat{θ} - θ_{0}) \frac{\partial S (θ_{0})}{\partial θ} + \frac{1}{2} {(\hat{θ} - θ_{0})}^{2} \frac{\partial^{2} S (θ^{*} (\hat{θ}, θ_{0}))}{\partial θ^{2}},

where

| θ^{*} (\hat{θ}, θ_{0}) - θ_{0} | \leq | \hat{θ} - θ_{0} |

. Dividing by

\sqrt{n}

we get

0 = \frac{1}{\sqrt{n}} S (θ_{0}) + \sqrt{n} (\hat{θ} - θ_{0}) [\frac{1}{n} \frac{\partial S (θ_{0})}{\partial θ} + \frac{1}{2 n} (\hat{θ} - θ_{0}) \frac{\partial^{2} S (θ^{*} (\hat{θ}, θ_{0}))}{\partial θ^{2}}] .

(22)

By the central limit theorem,

\frac{1}{\sqrt{n}} S (θ_{0}) = {\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \frac{\partial \log f_{X} (X_{i}; θ)}{\partial θ}|}_{θ = θ_{0}} \overset{D}{⟶} N (0, I (θ_{0})),

since

\partial \log f_{X} (X_{i}; θ) / \partial θ

are i.i.d. random variables with mean zero and variance and

I (θ_{0}) < \infty

.

Also, by the weak law of large numbers,

- \frac{1}{n} \frac{\partial S (θ_{0})}{\partial θ} = {- \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial^{2} \log f_{X} (X_{i}; θ)}{\partial θ^{2}}|}_{θ = θ_{0}} \overset{P}{⟶} I (θ_{0}) .

Using the law of large numbers, we have

\frac{1}{n} |\frac{\partial^{2} S (θ^{*} (\hat{θ}, θ_{0}))}{\partial θ^{2}}| \leq \frac{1}{n} \sum_{i = 1}^{n} {|\frac{\partial^{3} \log f_{X} (X_{i}; θ)}{\partial θ^{3}}|}_{θ = θ^{*}} \leq \frac{1}{n} \sum_{i = 1}^{n} K (X_{i}) \overset{P}{⟶} E (K (X)) < k,

i.e.,

| \partial^{2} S (θ^{*} (\hat{θ}, θ_{0})) / \partial θ^{2} | / n

is bounded in probability by the constant k: for any

ε > 0

, the probability that it is less than

k + ε

approaches 1. Finally, since

\hat{θ} \overset{P}{⟶} θ_{0}

, we have

\frac{1}{2 n} (\hat{θ} - θ_{0}) \frac{\partial^{2} S (θ^{*} (\hat{θ}, θ_{0}))}{\partial θ^{2}} \overset{P}{⟶} 0 .

Employing Equation (22) and combining the previous results, we obtain

\sqrt{n} (\hat{θ} - θ_{0}) = \frac{\overset{\overset{D}{⟶} N (0, I (θ_{0}))}{\overset{︷}{\frac{1}{\sqrt{n}} S (θ_{0})}}}{- \underset{\overset{P}{⟶} I (θ_{0})}{\underset{︸}{\frac{1}{n} \frac{\partial S (θ_{0})}{\partial θ}}} - \underset{\overset{P}{⟶} 0}{\underset{︸}{\frac{1}{2 n} (\hat{θ} - θ_{0}) \frac{\partial^{2} S (θ^{*} (\hat{θ}, θ_{0}))}{\partial θ^{2}}}}},

and by Slutsky’s theorem, we conclude that

\sqrt{n} (\hat{θ} - θ_{0}) \overset{D}{⟶} N (0, I^{- 1} (θ_{0})) .

□

In the Appendix A of this article, the entries of the information matrix are calculated, resulting in

I (θ_{0}) = n (\begin{matrix} 1 - c_{α}^{2} ϕ^{2} (α) - α c_{α} ϕ (α) & \frac{1}{σ} (α + c_{α} ϕ (α)) \\ \frac{1}{σ} (α + c_{α} ϕ (α)) & \frac{1}{σ^{2}} (2 + α^{2} + α c_{α} ϕ (α)) \end{matrix}) .

On the other hand, using the previous results, we have that

\hat{θ} \sim N_{2} (θ, Σ^{- 1} (\hat{θ}))

, where

Σ^{- 1} (\hat{θ}) = \frac{1}{| I (\hat{θ}) |} (\begin{matrix} \frac{1}{{\hat{σ}}^{2}} (2 + {\hat{α}}^{2} + \hat{α} c_{\hat{α}} ϕ (\hat{α})) & - \frac{1}{\hat{σ}} (\hat{α} + c_{\hat{α}} ϕ (\hat{α})) \\ - \frac{1}{\hat{σ}} (\hat{α} + c_{\hat{α}} ϕ (\hat{α})) & 1 - c_{\hat{α}}^{2} ϕ^{2} (\hat{α}) - \hat{α} c_{\hat{α}} ϕ (\hat{α}) \end{matrix}),

(23)

is the inverse of the observed information matrix and

| I (\hat{θ}) | = n^{2} J (\hat{θ}) / {\hat{σ}}^{2}

is the determinant of the matrix

I (\hat{θ})

, where

J (\hat{θ}) = 2 - \hat{α} c_{\hat{α}}^{3} ϕ^{3} (\hat{α}) - \hat{α} c_{\hat{α}} ϕ (\hat{α}) (3 + {\hat{α}}^{2}) - c_{\hat{α}}^{2} ϕ^{2} (\hat{α}) (3 + 2 {\hat{α}}^{2})

.

By using (23), we find that approximately

100 (1 - β) %

asymptotic confidence intervals for

α

and

σ

are

\hat{α} \pm \frac{z_{β / 2}}{n} \sqrt{\frac{2 + {\hat{α}}^{2} + \hat{α} c_{\hat{α}} ϕ (\hat{α})}{J (\hat{θ})}} a n d \hat{σ} (1 \pm \frac{z_{β / 2}}{n}) \sqrt{\frac{1 - c_{\hat{α}}^{2} ϕ^{2} (\hat{α}) - \hat{α} c_{\hat{α}} ϕ (\hat{α})}{J (\hat{θ})}},

where

z_{β / 2}

is the upper

β

-th percentile of the SN distribution.

We will use the Kolmogorov–Smirnov (K-S) test to assess the goodness-of-fit for the proposed models. The K-S test statistic D is defined as the maximum absolute difference between the empirical cumulative distribution function (ECDF) of the sample and the CDF of the reference distribution:

D = {sup}_{x} | F_{n} (x) - F (x) |

, where

F_{n} (x)

is the ECDF of the sample and

F (x)

is the theoretical CDF. Additionally, to evaluate the fit of various models, we will employ the maximum likelihood technique and well-known fitting criteria, specifically the Hannan–Quinn information criterion (HQIC), the Bayesian information criterion (BIC), the consistent Akaike’s information criterion (CAIC), and the Akaike information criterion (AIC). For the UTPN model, these criteria are defined as follows:

\begin{matrix} AIC & = - 2 ℓ (\hat{θ}) + 2 m, & CAIC & = - 2 ℓ (\hat{θ}) + \frac{2 m n}{n - m - 1}, \\ BIC & = - 2 ℓ (\hat{θ}) + m \log (n), & HQIC & = - 2 ℓ (\hat{θ}) + 2 m \log (\log (n)), \end{matrix}

where m is the number of parameters. The R programming language (see [28]) will be utilized for the simulation and practical aspects, as discussed in the next section.

4. Simulation Study

This section examines the performance of the MLEs and the asymptotic confidence intervals for the parameters indexing the UTPN distribution through Monte Carlo simulations. The sample size is set at

n = 25

, 35, 50, 100, 200, and 500, while the parameters are fixed at

α =

−

0.3

, 0.3, and 4.5, with

σ = 0.4

and 2.3. For each combination,

M = 10, 000

pseudo-random samples are generated from the UTPN distribution using the inverse CDF method, meaning

x = {(\exp \{Φ^{- 1} (\frac{u}{c_{α}}) - α\})}^{σ},

(24)

where u is a uniform

(0, 1)

observation.

To assess the performance of the MLEs and their asymptotic confidence intervals, the bias (Bias), standard error (SE), root mean squared error (RMSE), and coverage probability (CP) of the 95% confidence intervals are calculated. Insights can be gleaned from Table 1.

Table 1. Empirical mean, bias, SE, RMSE, and 95% CP for the ML estimates of

α

and

σ

in the UTPN distribution across different combinations of

α

and

σ

parameters.

The simulation study, detailed in Table 1, provides valuable insights into the performance of ML estimates for the UTPN distribution under varying sample sizes (n) and true values of the scale parameter (

σ

). Notably, the estimates exhibit commendable convergence as sample size increases, reflecting the robustness of the ML method. When

α

is a positive value, for a true value of

σ = 0.4

, the estimates of

\hat{α}

and

\hat{σ}

approach stability as n grows. The bias diminishes, SE decreases, and the RMSE converges, indicating the dependability and accuracy of the ML estimates. The CP of the 95% confidence intervals consistently approaches the nominal level, highlighting the precision of the estimates. Similarly, when the true value of

σ = 2.3

, the ML estimates exhibit convergence properties as the sample size increases. The bias decreases, SE reduces, and RMSE stabilizes, reflecting the consistency and efficiency of the estimation approach. The CP of the confidence intervals remains close to the expected 95% level, underscoring the dependability of the ML estimates even under larger-scale values. When

α

is a negative value, the asymptotic convergence is slow for small sample sizes. In summary, the simulation results affirm the suitability and robustness of the ML estimation approach for the UTPN distribution, particularly in achieving reliable parameter estimates as sample sizes increase, regardless of variations in the true scale parameter. These findings contribute to the methodological robustness of the UTPN distribution, enhancing its applicability in diverse statistical modeling scenarios.

5. Data Analysis

5.1. The Rock Dataset

The rock dataset from the R [28] library provides a detailed analysis of the chemical composition of 48 samples of igneous and metamorphic rocks, as seen in Table 2. Collected in the 1920s, these samples have served as a valuable tool for teaching statistics and data analysis. Table 3 presents a descriptive summary of the variable shape, which quantifies the ratio between the perimeter (measured in pixels) and the square root of the area of the pore space (measured in pixels) for the 48 rock samples from a petroleum reservoir. This data set is right-skewed and has a big kurtosis with a small data sample. On the other side, the results of the K-S test show a maximum difference of 0.10462 between the data and the theoretical UTPN distribution. With a p-value of 0.6696, the null hypothesis that the data come from the UTPN distribution is not rejected, indicating an adequate fit.

Table 2. Shape ratios for rock dataset.

Table 3. Summary statistics for the rock dataset.

Now, we evaluate the UTPN model against a set of competing models, which are as follows.

(i): Unit-logistic distribution [5]: The unit-logistic distribution, with two parameters, is defined by the PDF

$f (x; μ, β) = \frac{β μ^{β} x^{β - 1} {(1 - μ)}^{β} {(1 - x)}^{β - 1}}{{[{(1 - μ)}^{β} x^{β} + μ^{β} {(1 - x)}^{β}]}^{2}}, 0 < x < 1,$

where $0 < μ < 1$ represents the median of X, and $β > 0$ is the shape parameter.
(ii): Kumaraswamy distribution [29]: The two-parameter Kumaraswamy distribution is defined by the PDF

$f (x; α, β) = α β x^{β - 1} {(1 - x^{β})}^{α - 1}, 0 < x < 1,$

where $α > 0$ and $β > 0$ .
(iii): Beta distribution [30]: The two-parameter beta distribution is characterized by the PDF

$f (x; α, β) = \frac{1}{B (α, β)} x^{α - 1} {(1 - x)}^{β - 1}, 0 < x < 1,$

where $α > 0$ and $β > 0$ .

Based on the results presented in Table 4, meaningful conclusions can be drawn regarding the suitability of the evaluated models to adequately represent the distribution of the rock dataset. The K-S test statistic D values for each evaluated model are presented. The UTPN model exhibits the highest log-likelihood value (57.94) among the considered models, indicating a superior fit to the observed data. Furthermore, information criteria AIC, BIC, CAIC, and HQIC support the superiority of the UTPN model by presenting the lowest values in each case. This body of evidence underscores the utility of the UTPN model compared to the unit-logistic (ULO), beta (B), and Kumaraswamy (KAM) models in accurately representing the distribution of igneous and metamorphic rock samples. Parameter estimates and their respective standard deviations provide detailed insights into the shape and variability of the UTPN model.

Table 4. Model parameter estimates, log-likelihood values, and goodness-of-fit measures for the rock dataset.

Figure 5 displays histograms and CDF for the dataset, accompanied by fitted distributions using ML estimates in UTPN, ULO, beta, and Kumaraswamy models. This analysis underscores their robust and efficient ability to model data within the interval (0, 1). These findings suggest that the UTPN model could be a valuable alternative in modeling similar phenomena, especially in situations involving small sample sizes. Overall, the UTPN model emerges as the preferred choice for modeling the rock dataset, underscoring its suitability and precision compared to the competing models.

Figure 5. Models and estimated CDFs for the rock dataset.

5.2. Computation Time of P3 Algorithms

Information on the computational time of P3 algorithms, as presented in Table 5, can be found in a study by Caramanis et al. [31]. The authors of [31] used them to fit a novel statistical model that encompasses univariate and bivariate approaches. Table 6 illustrates the right-skewed nature of the dataset. Notably, the sample size is small, approximately 20, which presents a challenge for modeling. On the other side, the maximum difference between the data and the theoretical UTPN distribution, according to the K-S test results, is 0.1480. With a p-value of 0.7211, the null hypothesis, according to which the data come from the UTPN distribution, is accepted, which implies an adequate fit.

Table 5. Computational times of P3 algorithms.

Table 6. Summary statistics for the dataset on the computational duration of P3 algorithms.

The outcomes derived from the analysis presented in Table 7 offer valuable insights into the performance of the evaluated models in capturing the underlying distribution of the P3 algorithm dataset. The test statistic D values from the K-S test are provided for each evaluated model. Notably, the UTPN model exhibits a log-likelihood value of 8.07, suggesting a favorable fit to the data. This is further supported by the information criteria (AIC, BIC, CAIC, and HQIC) consistently presenting the lowest values for the UTPN model compared to alternative models, namely the ULO, B, and KAM distributions. The parameter estimates provide nuanced understanding, with

\hat{α}

and

\hat{σ}

shedding light on the shape and variability of the UTPN model. While uncertainty is inherent in parameter estimation, the UTPN model, characterized by

\hat{α} = 0.287

and

\hat{σ} = 2.177

, emerges as a promising candidate for effectively modeling data within the (0, 1) interval.

Table 7. Model parameter estimates, log-likelihood values, and goodness-of-fit measures for the P3 algorithms dataset.

Graphically, Figure 6 clearly demonstrates that the UTPN distribution exhibits the most favorable performance. The figure showcases a histogram overlaid with the fitted density function, alongside a plot illustrating the empirical distribution with the estimated CDF of these fitted distributions.

Figure 6. Models and estimated CDFs for P3 algorithms dataset.

6. Conclusions

In numerous applied scientific fields, various metrics, such as indicators, percentages, proportions, ratios, and rates, measured on the scale of (0, 1) serve as crucial study variables for characterizing diverse phenomena. However, the current statistical literature offers limited model options for handling these variables. The beta and Kumaraswamy distributions are two of the main models. This study introduces a flexible two-parameter probability distribution with a bounded domain, derived using an exponential transformation of a truncated positive normal variable, this transformation provides statistical properties related to the distribution in simple and closed form. We investigate several statistical properties of the proposed distribution, including maximum likelihood analyses conducted on two practical datasets. Notably, the obtained findings demonstrate that the proposed distribution exhibits greater flexibility compared to commonly used statistical distributions, such as beta, Kumaraswamy, and unit-logistic distributions. Particularly in the realm of modeling small samples, the obtained results underscore the superior performance of the UTPN distribution. On the other hand, the study’s findings suggest that the UTPN distribution may not be ideal for modeling lifetime data with a decreasing HR function or a bathtub-shaped HR, which includes burn-in and wear-out phases along with extended periods of low, constant hazard. Therefore, further research may aim to improve the distribution to overcome these limitations. Moreover, future research avenues could explore the derivation of alternative models from the TPN distribution using different transformations, as well as the examination of the proposed model within the quantile regression framework.

Author Contributions

Conceptualization, H.S.S. and H.S.B.; methodology, H.S.S., H.S.B., W.E.C. and L.B.-B.; software, F.E.A., W.E.C., L.B.-B. and O.A.; validation, F.E.A., W.E.C. and L.B.-B.; formal analysis, H.S.S., W.E.C., L.B.-B. and F.E.A.; writing—original draft preparation, H.S.S., W.E.C. and L.B.-B.; writing—review and editing, H.S.S., H.S.B., W.E.C. and L.B.-B.; visualization, W.E.C., L.B.-B., F.E.A. and O.A.; supervision, H.S.S. and H.S.B.; funding acquisition, F.E.A. and O.A. All authors have read and agreed to the published version of the manuscript.

Funding

The paper was funded by King Faisal University, Saudi Arabia (Grant No. 6059).

Data Availability Statement

Within the paper, references to the data analyzed are listed.

Acknowledgments

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia (Grant No. 6059).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The following are the second derivatives of the log-likelihood function and the elements of the Fisher information matrix:

\begin{matrix} \frac{\partial^{2} l (α, σ)}{\partial α^{2}} & = & n (- 1 + \frac{ϕ^{2} (α)}{Φ^{2} (α)} + α \frac{ϕ (α)}{Φ (α)}), \\ \frac{\partial^{2} l (α, σ)}{\partial σ \partial α} & = & \frac{1}{σ^{2}} \sum_{i = 1}^{n} \log (x_{i}), \\ \frac{\partial^{2} l (α, σ)}{\partial σ^{2}} & = & \frac{n}{σ^{2}} - \frac{2 α}{σ^{3}} \sum_{i = 1}^{n} \log (x_{i}) - \frac{3}{σ^{4}} \sum_{i = 1}^{n} \log^{2} (x_{i}) \end{matrix}

and

\begin{matrix} E (- \frac{\partial^{2} l (α, σ)}{\partial α^{2}}) & = & n (1 - \frac{ϕ^{2} (α)}{Φ^{2} (α)} - α \frac{ϕ (α)}{Φ (α)}), \\ E (- \frac{\partial^{2} l (α, σ)}{\partial σ \partial α}) & = & \frac{n}{σ} (α + \frac{ϕ (α)}{Φ (α)}), \\ E (- \frac{\partial^{2} l (α, σ)}{\partial σ^{2}}) & = & \frac{n}{σ^{2}} (2 + α^{2} + α \frac{ϕ (α)}{Φ (α)}) . \end{matrix}

References

Gómez, H.J.; Olmos, N.M.; Varela, H.H.; Bolfarine, H. Inference for a truncated positive normal distribution. Appl. Math. J. Chin. Univ. Ser. B 2018, 33, 163–176. [Google Scholar] [CrossRef]
Topp, C.W.; Leone, F.C. A family of J-shaped frequency functions. J. Am. Stat. Assoc. 1955, 50, 209–219. [Google Scholar] [CrossRef]
Grassia, A. On a family of distributions with argument between 0 and 1 obtained by transformation of the gamma and derived compound distributions. Aust. J. Stat. 1977, 19, 108–114. [Google Scholar] [CrossRef]
Gómez-Déniz, E.; Sordo, M.A.; Calderín-Ojeda, E. The log-Lindley distribution as an alternative to the beta regression model with applications in insurance. Insur. Math. Econ. 2013, 54, 49–57. [Google Scholar] [CrossRef]
Menezes, A.F.B.; Mazucheli, J.; Dey, S. The unit-logistic distribution: Different methods of estimation. Braz. Oper. Res. Soc. 2018, 38, 555–578. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Dey, S. The unit-birnbaum-saunders distribution with applications. Chil. J. Stat. 2018, 9, 47–57. [Google Scholar]
Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; de Oliveira, R.P.; Ghitany, M.E. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the dodeling of quantiles conditional on covariates. J. Appl. Stat. 2020, 47, 954–974. [Google Scholar] [CrossRef] [PubMed]
Mazucheli, J.; Menezes, A.F.; Dey, S. Unit-Gompertz distribution with applications. Statistica 2019, 79, 25–43. [Google Scholar]
Ghitany, M.E.; Mazucheli, J.; Menezes, A.F.B.; Alqallaf, F. The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Commun. Stat. Theory Methods 2019, 48, 3423–3438. [Google Scholar] [CrossRef]
Haq, M.A.; Hashmi, S.; Aidi, K.; Ramos, P.L.; Louzada, F. Unit modified Burr-III distribution: Estimation, characterizations and validation test. Ann. Data Sci. 2020, 10, 415–440. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the one parameter unit-Lindley distribution and its sssociated regression model for proportion data. J. Appl. Stat. 2019, 46, 700–714. [Google Scholar] [CrossRef]
Concha-Aracena, M.S.; Barrios-Blanco, L.; Elal-Olivero, D.; Ferreira da Silva, P.H.; Nascimento, D.C.D. Extending normality: A case of unit distribution generated from the moments of the standard normal distribution. Axioms 2022, 11, 666. [Google Scholar] [CrossRef]
Alvarez, P.I.; Varela, H.; Cortés, I.E.; Venegas, O.; Gómez, H.W. Modified unit-half-normal distribution with applications. Mathematics 2023, 12, 136. [Google Scholar] [CrossRef]
Bakouch, H.S.; Nik, A.S.; Asgharzadeh, A.; Salinas, H.S. A flexible probability model for proportion data: Unit-half-normal distribution. Commun. Stat. Case Stud. Data Anal. Appl. 2021, 7, 271–288. [Google Scholar] [CrossRef]
Bakouch, H.S.; Hussain, T.; Tošić, M.; Stojanović, V.S.; Qarmalah, N. Unit exponential probability distribution: Characterization and applications in environmental and engineering data modeling. Mathematics 2023, 11, 4207. [Google Scholar] [CrossRef]
Okorie, I.E.; Afuecheta, E.; Bakouch, H.S. Unit upper truncated Weibull distribution with extension to 0 and 1 inflated model—Theory and applications. Heliyon 2023, 9, e22260. [Google Scholar] [CrossRef] [PubMed]
Glänzel, W. A characterization theorem based on truncated moments and its application to some distribution families. In Mathematical Statistics and Probability Theory; Bauer, P., Konecny, F., Wertz, W., Eds.; Springer: Dordrecht, The Netherlands, 1987; pp. 75–84. [Google Scholar]
Glänzel, W. A characterization of the normal distribution. Stud. Sci. Math. Hung. 1988, 2, 89–91. [Google Scholar]
Hamedani, G.G. Characterizations of Cauchy, normal, and uniform distributions. Stud. Sci. Math. Hung. 1993, 3, 243–248. [Google Scholar]
Akhila, P.; Girish Babu, M.; Bakouch, H.S. A versatile probabilistic model based on Yun-G family of distributions and its applications in engineering sector. J. Kerala Stat. Assoc. 2023, 34, 52–83. [Google Scholar]
Glaser, R.E. Bathtub and related failure rate characterization. J. Am. Stat. Assoc 1980, 75, 667–672. [Google Scholar] [CrossRef]
Bonferroni, C.E. Elementi di Statistica Generale; Seeber: Firenze, Italy, 1930. [Google Scholar]
Rényi, A. On measures of information and entropy. In 4th Berkeley Symposium on Mathematical Statistics and Probability; Neymann, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Casella, G.; Berger, R.L. Statistical Inference; Duxbury Advanced Series Thomson Learning: Pacific Grove, CA, USA, 2002. [Google Scholar]
Popović, B.V.; Ristić, M.M.; Cordeiro, G.M. A two-parameter distribution obtained by compounding the generalized exponential and exponential distributions. Mediterr. J. Math 2016, 13, 2935–2949. [Google Scholar] [CrossRef]
Alomair, G.; Akdoğan, Y.; Bakouch, H.S.; Erbayram, T. On the maximum likelihood estimators’ uniqueness and existence for two unitary distributions: Analytically and graphically, with application. Symmetry 2024, 16, 610. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: https://www.R-project.org/ (accessed on 10 January 2024).
Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Chapter 25: Beta Distributions; Wiley: Hoboken, NJ, USA, 1955; Volume 2. [Google Scholar]
Caramanis, M.; Stremel, J.; Fleck, W.; Daniel, S. Probabilistic production costing: An investigation of alternative algorithms. Int. J. Electr. Power Energy Syst. 1983, 5, 75–86. [Google Scholar] [CrossRef]

Figure 1. Graph of the UTPN densities for various values of

α

and

σ

.

Figure 2. Graphs of the HR function for the UTPN distribution with varying values of

α

and

σ

.

Figure 3. Graph of the UTPN distribution’s mean, variance, skewness, and kurtosis for various values of

α

and

σ

.

Figure 4. Bonferroni curve for the UTPN distribution for

α = - 1.5, - 0.5, 0, 0.5, 1.5

and

σ = 1

.

Figure 5. Models and estimated CDFs for the rock dataset.

Figure 6. Models and estimated CDFs for P3 algorithms dataset.

Table 1. Empirical mean, bias, SE, RMSE, and 95% CP for the ML estimates of

α

and

σ

in the UTPN distribution across different combinations of

α

and

σ

parameters.

Table 1. Empirical mean, bias, SE, RMSE, and 95% CP for the ML estimates of

α

and

σ

in the UTPN distribution across different combinations of

α

and

σ

parameters.

True Value		n	Estimate	Mean	Bias	SE	RMSE	CP
$α =$ − $0.3$	$σ = 0.4$	25	$\hat{α}$	−0.2569	0.0431	1.7876	1.4398	0.8744
			$\hat{σ}$	0.4123	0.0123	0.3403	0.2745	0.8112
		35	$\hat{α}$	−0.3351	−0.0351	1.5696	1.1978	0.8958
			$\hat{σ}$	0.4227	0.0227	0.3062	0.2296	0.8391
		50	$\hat{α}$	−0.4276	−0.1276	1.5773	1.4229	0.9094
			$\hat{σ}$	0.4387	0.0387	0.3231	0.3000	0.8707
		100	$\hat{α}$	−0.3324	−0.0324	0.6858	0.6696	0.9249
			$\hat{σ}$	0.4114	0.0114	0.1217	0.1200	0.8990
		200	$\hat{α}$	−0.3167	−0.0167	0.4280	0.4314	0.9432
			$\hat{σ}$	0.4057	0.0057	0.0724	0.0740	0.9238
		500	$\hat{α}$	−0.3042	−0.0042	0.2598	0.2621	0.9500
			$\hat{σ}$	0.4018	0.0018	0.0431	0.0436	0.9409
	$σ = 2.3$	25	$\hat{α}$	−0.2008	0.0992	1.5345	1.1107	0.8734
			$\hat{σ}$	2.3115	0.0115	1.6388	1.1557	0.8096
		35	$\hat{α}$	−0.3132	−0.0132	1.4698	1.1302	0.8954
			$\hat{σ}$	2.4045	0.1045	1.6236	1.2284	0.8386
		50	$\hat{α}$	−0.3693	−0.0693	1.2939	1.0967	0.9090
			$\hat{σ}$	2.4479	0.1479	1.4632	1.2558	0.8701
		100	$\hat{α}$	−0.3412	−0.0412	0.6750	0.6789	0.9251
			$\hat{σ}$	2.3764	0.0764	0.6855	0.7021	0.8993
		200	$\hat{α}$	−0.3168	−0.0168	0.4280	0.4315	0.9433
			$\hat{σ}$	2.3329	0.0329	0.4162	0.4255	0.9238
		500	$\hat{α}$	−0.3044	−0.0044	0.2598	0.2620	0.9502
			$\hat{σ}$	2.3105	0.0105	0.2481	0.2507	0.9411
$α = 0.3$	$σ = 0.4$	25	$\hat{α}$	0.3286	0.0286	1.0046	0.8959	0.9076
			$\hat{σ}$	0.4100	0.0100	0.1995	0.1759	0.8472
		35	$\hat{α}$	0.3050	0.0050	0.8187	0.7862	0.9202
			$\hat{σ}$	0.4109	0.0109	0.1606	0.1577	0.8717
		50	$\hat{α}$	0.2954	−0.0046	0.6740	0.6925	0.9317
			$\hat{σ}$	0.4094	0.0094	0.1320	0.1462	0.8929
		100	$\hat{α}$	0.3000	−0.0000	0.4139	0.4160	0.9428
			$\hat{σ}$	0.4038	0.0038	0.0743	0.0749	0.9170
		200	$\hat{α}$	0.3002	0.0002	0.2852	0.2851	0.9550
			$\hat{σ}$	0.4018	0.0018	0.0505	0.0506	0.9369
		500	$\hat{α}$	0.3011	0.0011	0.1780	0.1792	0.9500
			$\hat{σ}$	0.4005	0.0005	0.0312	0.0315	0.9476
	$σ = 2.3$	25	$\hat{α}$	0.3323	0.0323	0.9929	0.8855	0.9076
			$\hat{σ}$	2.3531	0.0531	1.1286	0.9970	0.8471
		35	$\hat{α}$	0.3056	0.0056	0.8164	0.7825	0.9202
			$\hat{σ}$	2.3617	0.0617	0.9203	0.9028	0.8717
		50	$\hat{α}$	0.2962	−0.0038	0.6909	0.6903	0.9317
			$\hat{σ}$	2.3533	0.0533	0.7898	0.8422	0.8929
		100	$\hat{α}$	0.2996	−0.0004	0.4142	0.4173	0.9427
			$\hat{σ}$	2.3221	0.0221	0.4278	0.4326	0.9171
		200	$\hat{α}$	0.3000	0.0000	0.2853	0.2851	0.9550
			$\hat{σ}$	2.3106	0.0106	0.2902	0.2912	0.9369
		500	$\hat{α}$	0.3012	0.0012	0.1780	0.1791	0.9504
			$\hat{σ}$	2.3028	0.0028	0.1796	0.1807	0.9475
$α = 4.5$	$σ = 0.4$	25	$\hat{α}$	4.7322	0.2322	0.7000	0.7925	0.9478
			$\hat{σ}$	0.3887	−0.0113	0.0551	0.0579	0.9118
		35	$\hat{α}$	4.6605	0.1605	0.5831	0.6322	0.9497
			$\hat{σ}$	0.3921	−0.0079	0.0469	0.0487	0.9237
		50	$\hat{α}$	4.6098	0.1098	0.4828	0.5055	0.9515
			$\hat{σ}$	0.3945	−0.0055	0.0395	0.0400	0.9317
		100	$\hat{α}$	4.5524	0.0524	0.3374	0.3462	0.9516
			$\hat{σ}$	0.3975	−0.0025	0.0281	0.0284	0.9434
		200	$\hat{α}$	4.5248	0.0248	0.2372	0.2410	0.9493
			$\hat{σ}$	0.3988	−0.0012	0.0200	0.0201	0.9432
		500	$\hat{α}$	4.5109	0.0109	0.1496	0.1516	0.9486
			$\hat{σ}$	0.3994	−0.0006	0.0126	0.0128	0.9469
	$σ = 2.3$	25	$\hat{α}$	4.7323	0.2323	0.7000	0.7926	0.9478
			$\hat{σ}$	2.2351	−0.0649	0.3166	0.3330	0.9118
		35	$\hat{α}$	4.6605	0.1605	0.5831	0.6322	0.9497
			$\hat{σ}$	2.2543	−0.0457	0.2698	0.2803	0.9237
		50	$\hat{α}$	4.6096	0.1096	0.4829	0.5056	0.9515
			$\hat{σ}$	2.2686	−0.0314	0.2271	0.2301	0.9318
		100	$\hat{α}$	4.5524	0.0524	0.3374	0.3462	0.9515
			$\hat{σ}$	2.2854	−0.0146	0.1617	0.1633	0.9434
		200	$\hat{α}$	4.5247	0.0247	0.2372	0.2410	0.9493
			$\hat{σ}$	2.2933	−0.0067	0.1147	0.1158	0.9432
		500	$\hat{α}$	4.5108	0.0108	0.1496	0.1517	0.9485
			$\hat{σ}$	2.2969	−0.0031	0.0727	0.0736	0.9470

Table 2. Shape ratios for rock dataset.

0.0903296	0.1486220	0.1833120	0.1170630	0.1224170	0.1670450	0.1896510
0.1641270	0.2036540	0.1623940	0.1509440	0.1481410	0.2285950	0.2316230
0.1725670	0.1534810	0.2043140	0.2627270	0.2000710	0.1448100	0.1138520
0.2910290	0.2400770	0.1618650	0.2808870	0.1794550	0.1918020	0.1330830
0.2252140	0.3412730	0.3116460	0.2760160	0.1976530	0.3266350	0.1541920
0.2760160	0.1769690	0.4387120	0.1635860	0.2538320	0.3286410	0.2300810
0.4641250	0.4204770	0.2007440	0.2626510	0.1824530	0.2004470

Table 3. Summary statistics for the rock dataset.

Size	Mean	Variance	Skewness	Kurtosis
48	0.2181	0.0835	1.1693	4.1098

Table 4. Model parameter estimates, log-likelihood values, and goodness-of-fit measures for the rock dataset.

Models	Estimation (SE)	$\log (L)$	AIC	BIC	CAIC	HQIC	K-S
UTPN	$\hat{α} = 4.485 (0.480)$	57.94	−111.88	−108.13	−111.61	−110.46	0.11
	$\hat{σ} = 0.354 (0.036)$
ULO	$\hat{μ} = 0.203 (0.011)$	56.95	−109.91	−106.16	−109.64	−108.49	0.10
	$\hat{β} = 3.828 (0.461)$
B	$\hat{α} = 5.942 (1.181)$	55.60	−107.20	−103.46	−106.93	−105.79	0.14
	$\hat{β} = 21.205 (4.347)$
KAM	$\hat{α} = 2.719 (0.293)$	52.49	−100.98	−97.24	−100.72	−99.57	0.15
	$\hat{β} = 44.660 (17.574)$

Table 5. Computational times of P3 algorithms.

0.853	0.759	0.874	0.800	0.716	0.557	0.503	0.399	0.334	0.207	0.118
0.097	0.078	0.067	0.056	0.044	0.036	0.026	0.019	0.014	0.010	0.118

Table 6. Summary statistics for the dataset on the computational duration of P3 algorithms.

Size	Mean	Variance	Skewness	Kurtosis
22	0.3039	0.3178	0.7114	1.8838

Table 7. Model parameter estimates, log-likelihood values, and goodness-of-fit measures for the P3 algorithms dataset.

Models	Estimation (SE)	$\log (L)$	AIC	BIC	CAIC	HQIC	K-S
UTPN	$\hat{α} = 0.287 (0.848)$	8.07	−12.15	−9.96	−11.51	−11.63	0.15
	$\hat{σ} = 2.177 (0.806)$
ULO	$\hat{μ} = 0.177 (0.068)$	7.71	−11.42	−9.24	−10.79	−10.91	0.14
	$\hat{β} = 0.817 (0.141)$
B	$\hat{α} = 0.554 (0.142)$	6.78	−9.56	−7.38	−8.93	−9.05	0.20
	$\hat{β} = 1.220 (0.376)$
KAM	$\hat{α} = 0.572 (0.148)$	6.84	−9.69	−7.51	−9.06	−9.17	0.20
	$\hat{β} = 1.231 (0.348)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Statistical Advancement of a Flexible Unitary Distribution and Its Applications

Abstract

1. Introduction

2. The Model and Its Properties

2.1. Characterizations of the UTPN Distribution Based on Its Hazard Function

2.2. Shapes

2.3. Quantile Function

2.4. Mode

2.5. Hazard Rate Function

2.6. Moments and Moment Generating Function

2.7. Curves of Bonferroni and Lorenz

2.8. Entropy

3. Estimation and Inference

3.1. Moments Estimator

3.2. Maximum Likelihood Estimator

4. Simulation Study

5. Data Analysis

5.1. The Rock Dataset

5.2. Computation Time of P3 Algorithms

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics