Next Article in Journal
New Estimation Method of an Error for J Iteration
Next Article in Special Issue
Confidence Intervals for the Ratio of Variances of Delta-Gamma Distributions with Applications
Previous Article in Journal
HPSBA: A Modified Hybrid Framework with Convergence Analysis for Solving Wireless Sensor Network Coverage Optimization Problem
Previous Article in Special Issue
New Robust Estimators for Handling Multicollinearity and Outliers in the Poisson Model: Methods, Simulation and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Unit Half-Logistic Geometric Distribution and Its Application in Insurance

by
Ahmed T. Ramadan
1,†,
Ahlam H. Tolba
2,*,† and
Beih S. El-Desouky
2,†
1
Department of Basic Sciences, High Raya Institute, Damietta 34511, Egypt
2
Department of Mathematics, Faculty of Science, Mansoura University, Mansoura 33516, Egypt
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2022, 11(12), 676; https://doi.org/10.3390/axioms11120676
Submission received: 8 August 2022 / Revised: 25 August 2022 / Accepted: 9 September 2022 / Published: 28 November 2022
(This article belongs to the Special Issue Computational Statistics & Data Analysis)

Abstract

:
A new one parameter distribution recently was proposed for modelling lifetime data called half logistic-geometric (HLG) distribution. In this paper, appropriate transformation is considered for HLG distribution and a new distribution is derived called unit half logistic-geometric (UHLG) distribution for modelling bounded data in the interval (0, 1). Some important statistical properties are investigated with a closed form quantile function. Some methods of parameter estimation are introduced to evaluate the distribution parameter and a simulation study is introduced to compare these different methods. A real data application in the insurance field is introduced to show the flexibility of the new distribution modelling such data comparing with other distributions.

1. Introduction

Modelling data sets bounded in the interval (0, 1) has become very important in recent times and is used in many fields to deal with survival and failure rates of products, see [1,2]. Therefore, many unit distributions bounded in the interval (0, 1) arise because of its flexibility dealing with such probabilistic models. In addition, many fields such as medical, actuarial and finance sciences are in desperate need of these kinds of distributions. As a result, many researchers have proposed unit distributions. For instance, Abd El-Monsef et al. [3] proposed a new two-parameter unit-omega distribution with flexible probability density function (pdf) and hazard function. Moreover, Altun et al. [4] studied a distribution called unit-improved second-degree Lindley distribution modelling data in the interval (0, 1). Moreover, Altun et al. [5] proposed a more flexible model, log-Bilal distribution, as an alternative to beta and unit-Lindley regression models. Additionally, Bayes et al. [6] proposed a new regression model for the relationship between one or more covariates and a response beta variable conditional mean. Cordeiro et al. [7] recently offered some statistical methods of lifetime and survival models.
Both beta [8] and Kumaraswamy [9] are common distributions for modelling such data in the unit interval (0, 1) and as a result, beta and Kumaraswamy [10] regression models have been extended to study the behaviour of variables in the presence of covariance. As an alternative to the beta regression model, Gómez-Déniz et al. [11] proposed a new Log–Lindley distribution model with useful applications in econometric analysis and actuarial settings. In addition, Korkmaz et al. [12] modified the Burr-XII distribution and obtained a new two-parameter distribution on the unit interval called the unit Burr-XII distribution and showed that it had better modelling capabilities than other competing models. Moreover, Mazucheli et al. [13] not only proposed a unit-Weibull two-parameter distribution, modelling data on the unit interval (0, 1), and proposed some useful statistics for this distribution, but also they recently (in 2020) considered the unit-Weibull distribution [14] as an alternative to the Kumaraswamy distribution for the modelling of quantiles and demonstrated the suitability for modelling quantiles in accounting, health and other social sciences. Moreover, [15] discussed modelling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution and [16] introduced an extended Cosine generalized family of distributions for reliability modelling: characteristics and applications with simulation study. Moreover, Mazucheli et al. [17] introduced the unit-Lindley distribution and investigated some of important statistical properties. In addition, Mitnik and Baek [10] presented two median-dispersion re-parameterizations of the Kumaraswamy distribution to facilitate its use in regression models. Furthermore, Mousa et al. [18] presented a new regression model named as the unit gamma distribution as an alternative to the beta regression model; [19,20] presented the competing risks models and regression competing risks models with Weibull lifetime distributions. Tadikamalla [21] proposed more flexible unit-gamma distribution aimed at modelling data in the unit interval (0, 1) too.
Recently (in 2020), Liu and Balakrishnan [22] proposed a new simple one-parameter half logistic-geometric (HLG) distribution useful for analyzing lifetime data with some interesting properties. The probability density function (pdf) and cumulative distribution function (cdf) of HLG distribution are defined, respectively, as
g ( y ) = θ θ + ( 2 θ ) e y , y > 0 , 0 < θ < 1 ,
G ( y ) = θ ( 1 e y ) θ + ( 2 θ ) e y , y > 0 , 0 < θ < 1 .
The aim of this paper is to derive a new flexible distribution modelling data in the unit interval (0, 1). The negative exponential function transformation was used to derive the new distribution named unit half logistic-geometric (UHLG) distribution with some attractive properties: (i) statistical functions of the UHLG distribution have closed form expressions; (ii) statistical properties of the UHLG distribution were derived in simple expressions; (iii) the UHLG distribution presents more flexibility, dealing with bounded unit interval data more than other distributions, as shown later in Section 6; (iv) because of its flexibility, a new regression model was introduced considering parameterizing the UHLG distribution in terms of its quantile function in a closed form expression.
The remainder of this paper is summarized as follows: In Section 2, the UHLG distribution is introduced with some important statistical properties. Some methods of parameter estimations are introduced in Section 3 to evaluate the unknown parameter and a simulation study is introduced to compare these different methods in Section 4. In addition, a regression model is introduced as an alternative to some other regression models in Section 5. Finally, an application with real data about risk survey is given in Section 6.

2. Unit Half Logistic-Geometry Distribution

In this section, the UHLG distribution is derived and some important functions are illustrated.

2.1. Cumulative and Density Functions of UHLG Distribution

In Equation (2), replacing e y with x, we obtain a new random variable X following the UHLG distribution with cdf as follows
F ( x ) = 1 θ ( 1 x ) θ + ( 2 θ ) x , 0 < x < 1 , θ > 0 .
As a result, the quantile function Q ( u | θ ) of UHLG distribution can be given as
Q ( u | θ ) = u θ 2 2 u + u θ , 0 < u < 1 , θ > 0 .
Taking the first derivative of the cdf given in Equation (3) with respect to x, the pdf is obtained as follows
f ( x ) = 2 θ ( θ + ( 2 θ ) x ) 2 , 0 < x < 1 , θ > 0 .
Theorem 1.
The pdf of the UHLG distribution is
 (i) 
decreasing function if 0 < θ < 2 ,
 (ii) 
increasing function if θ > 2 ,
 (iii) 
constant when θ = 2 .
Proof. 
The first derivative of the pdf with respect to x is given by
d f d x = 4 θ ( θ 2 ) ( x ( θ 2 ) θ ) 3 ,
and it is clear that
(i)
When 0 < θ < 2 , the first derivative is negative, which implies that the pdf of the UHLG distribution is decreasing;
(ii)
When θ > 2 , the first derivative is positive, which implies that the pdf of the UHLG distribution is increasing;
(iii)
Lastly, when θ = 2 , the pdf of the UHLG distribution is constant and equal to 1. □
Figure 1 and Figure 2 show the cdf and pdf functions, respectively, of the UHLG distribution at different values of θ .

2.2. Survival and Hazard Functions of the UHLG Distribution

The survival and the hazard rate functions of the UHLG distribution can take the next formulas, respectively,
S ( x ) = 1 F ( x ) = θ ( 1 x ) θ + ( 2 θ ) x , 0 < x < 1 , θ > 0 ,
H ( x ) = f ( x ) S ( x ) = 2 ( x 1 ) ( ( θ 2 ) x θ ) , 0 < x < 1 , θ > 0 .
The next theorem shows the different shapes of the UHLG hazard function with respect to θ .
Theorem 2.
The hazard rate function of the UHLG distribution is
 (i) 
bathtub (U-shaped) function if 0 < θ < 1 ,
 (ii) 
increasing function if θ 1 .
Proof. 
The first derivative of the hazard function with respect to x is given by
d H d x = 4 θ ( 1 x ) 4 ( 1 2 x ) ( x 1 ) 2 ( ( 2 θ ) x + θ ) 2 .
Clearly, d H d x and ψ ( θ ) = 4 θ ( 1 x ) 4 ( 1 2 x ) have the same signs.
The function ψ ( θ ) has a root equal to 1 θ 2 θ . Then:
(i)
When 0 < θ < 1 , the sign of ψ ( θ ) changes from negative to positive, which implies that the function H ( x ) is decreasing first and increasing second (bathtub shape) with minimum value equal to 1 θ 2 θ ;
(ii)
When θ 1 , the sign of ψ ( θ ) is always positive, which implies that the function H ( x ) is increasing.
This completes the proof. □
Figure 3 and Figure 4 show the cdf and pdf functions respectively of the UHLG distribution at different values of θ .

2.3. Moments

Consider X as a random variable following the UHLG distribution with pdf given in Equation (5). Then, the rth moment about zero, μ r , can be given as
μ r = 0 1 x r f ( x ) d x = 0 1 x r 2 θ ( θ + ( 2 θ ) x ) 2 d x = 2 θ 0 1 x r ( 1 + 2 θ θ x ) 2 d x .
Using Equation (3.194.1) in [23], where u = 1 , μ = r + 1 , β = 2 θ θ , and ν = 2 , the final form of μ r is given by
μ r = 2 θ ( r + 1 ) 2 F 1 2 , r + 1 ; r + 2 ; θ 2 θ ,
where 2 F 1 ( a , b ; c ; z ) = Γ ( c ) Γ ( a ) Γ ( b ) s = 0 Γ ( a + s ) Γ ( b + s ) Γ ( c + s ) s ! z s is the hypergeometric function.
The first four moments about zero can be given by putting r = 1 , 2 , 3 , and 4 in Equation (8).
Coefficients of skewness and kurtosis can be derived from moments about zero as follows
s k e w n e s s = μ 3 3 μ 1 μ 2 + 2 ( μ 1 ) 3 ( μ 2 ( μ 1 ) 2 ) 3 ,
k u r t o s i s = μ 4 4 μ 1 μ 3 + 6 ( μ 1 ) 2 μ 2 3 ( μ 1 ) 4 ( μ 2 ( μ 1 ) 2 ) 2 .
Figure 5 shows the mean and the variance plots and it is clear that the mean is always increasing but the variance is increasing when 0 θ < 2 and decreasing otherwise.
Moreover, Figure 6 shows the skewness and the kurtosis plots and it is clear that the skewness is positive when 0 θ < 2 and negative otherwise and the kurtosis is positive when 0 θ < 0.1 and negative otherwise.

2.4. Incomplete Moments and Related Measures

In this subsection, the rth incomplete moment, m r ( y ) , is introduced with some related measures, for instance, mean deviation about mean and median and Bonferroni and Lorenz curves.
The rth incomplete moment of the UHLG distribution is given by
m r ( y ) = 0 y x r f ( x ) d x = 0 y x r 2 θ ( θ + ( 2 θ ) x ) 2 d x = 2 θ 0 y x r ( 1 + 2 θ θ x ) 2 d x .
Using Equation (3.194.1) in [23], where u = 1 , μ = r + 1 , β = 2 θ θ , and ν = 2 , the final form of m r ( y ) is given by
m r ( y ) = 2 y r + 1 θ ( r + 1 ) 2 F 1 2 , r + 1 ; r + 2 ; ( θ 2 ) y θ ,
where 0 < y < 1 .
Some important statistical measures are defined based on the moments and the incomplete moments, such as the mean deviation about the mean D ( μ 1 ) and about the median D(M). These measures can be expressed as
D ( μ 1 ) = 2 μ 1 F ( μ 1 ) 2 m 1 ( μ 1 ) ,
and
D ( M ) = μ 1 2 m 1 ( M ) ,
where M = Q 0.5 .
Another related measure is the mean residual life (MRL), which is defined as the expected value of the remaining lifetimes after a fixed time point t. It can be defined in terms of the moments and incomplete moments as
m r l ( t ) = μ 1 m 1 ( t ) F ¯ ( t ) t .
Moreover, the mean inactivity time which represents the waiting time elapsed since the failure of an item on condition that this failure had occurred in ( 0 , t ) is given by
m i t ( t ) = t m 1 ( t ) F ( t ) .
Other important applications of the moments and incomplete moments are related to Bonferroni and Lorenz curves of X, which can be defined by
B ( π ) = m 1 ( q ) π μ 1 ,
and
L ( π ) = m 1 ( q ) μ 1 ,
respectively, where q = Q ( π ) follows from Equation (4) for a given probability π . The importance of Bonferroni and Lorenz curves is due to the wide variety of the potential applications of these curves. These curves can be applied in financial studies, medicine and insurance.
Other measure can be defined based on the moments and incomplete moments. For a complete list of these measures see [7].

2.5. Stress Strength Parameter

According to [24,25], the reliability, R, of a component arises when its strength is greater than its stress. Let X U H L G ( θ 1 ) represent the strength of the component and Y U H L G ( θ 2 ) represent its stress. It is said that the component is functioning when the condition Y < X is held and then its reliability is given by
R = P ( Y < X ) = 0 1 f X ( x | θ 1 ) F Y ( x | θ 2 ) d x = 0 1 2 θ 1 1 1 + 2 θ 1 θ 1 x 2 1 1 x 1 + 2 θ 2 θ 2 x d x = 1 2 θ 1 0 1 ( 1 x ) 1 θ 1 2 θ 1 x 2 1 θ 2 2 θ 2 x 1 d x .
Using Equation (3.211) in [23], where λ = 1 , μ = 2 , u = θ 1 2 θ 1 , e = 2 , ν = θ 2 2 θ 2 and σ = 1 , the integral is given by
R = θ 1 1 θ 1 F 1 ( 1 , 2 , 1 , 3 ; θ 1 2 θ 1 , θ 2 2 θ 2 ) ,
where F 1 ( a , b , c , d ; x , y ) = Γ ( d ) Γ ( a ) Γ ( d a ) 0 1 u a 1 ( 1 u ) d a 1 ( 1 u x ) b ( 1 u y ) c d u is the Appell hypergeometric function (see [26]).

2.6. Stochastic Ordering

The stochastic order arises when we have two independent continuous random variables, X 1 , X 2 , such that X 1 < X 2 ; we say that X 2 is stochastically smaller than X 1 , X 2 < l r X 1 if f 1 ( x ) f 2 ( x ) is a non-decreasing function of x. For more details see ([22,27,28]).
Proposition 1.
Let X 1 , X 2 be two independent random variables such that X 1 U H L G ( θ 1 ) , X 2 U H L G ( θ 2 ) . If θ 1 θ 2 , then X 2 < l r X 1 .
Proof. 
The first derivative of f 1 ( x ) f 2 ( x ) is given by
d d x f 1 ( x ) f 2 ( x ) = 4 θ 1 ( θ 1 θ 2 ) ( ( θ 2 2 ) x θ 2 ) θ 2 ( ( θ 1 2 ) x θ 1 ) 3 .
It is obvious that if θ 1 θ 2 , then d d x f 1 ( x ) f 2 ( x ) is non-negative and as a sequence, f 1 ( x ) f 2 ( x ) is a non-decreasing function of x and this completes the proof. □

3. Parameter Estimation

In this section, six methods of estimations are used to estimate θ , the parameter of UHLG distribution. These methods are maximum likelihood estimation method (MLE), Bayesian estimation method (BE), Cramer–Von-Mises method (CVME), least squares method (LSE), method of moments (MME), weighted least squares method (WLSE).

3.1. Maximum Likelihood Estimation Method (MLE)

Given a random sample ( x 1 , x 2 , , x n ) from the UHLG distribution, the likelihood estimation function, L, can be given as follows
L = i = 1 n f ( x i ; θ ) = ( 2 θ ) n i = 1 n ( θ + ( 2 θ ) x i ) 2 ,
and the logarithmic likelihood function is
l = log [ L ] = n log [ 2 θ ] 2 i = 1 n log [ θ + ( 2 θ ) x i ] ,
and the first derivatives of l with respect to θ are given by
d l d θ = n θ 2 i = 1 n 1 x i θ + ( 2 θ ) x i .

3.2. Bayesian Estimation Method

The Bayesian estimation (BE) method is used to fit the probability model to a set of data and summarize the results by the probability distribution of the model parameters. The data come from the prior distribution and the likelihood, L, function and give the posterior distribution.
Suppose that we have a non-informative prior distribution u ( θ ) = 1 θ .
Therefore, the posterior distribution function, g ( θ ) , is given by
g ( θ ) = u ( θ ) L 0 1 u ( θ ) L d θ = 1 θ ( 2 θ ) n i = 1 n ( θ + ( 2 θ ) x i ) 2 0 1 1 θ ( 2 θ ) n i = 1 n ( θ + ( 2 θ ) x i ) 2 d θ .
According to the squared error loss function, Bayes estimate, θ ^ , is the posterior mean of θ with pdf given in Equation (14) as follows
θ ^ = 0 1 θ g ( θ ) d θ = 0 1 ( 2 θ ) n i = 1 n ( θ + ( 2 θ ) x i ) 2 0 1 1 θ ( 2 θ ) n i = 1 n ( θ + ( 2 θ ) x i ) 2 d θ d θ .
Since these integrals cannot be obtained analytically, alternative methods are assumed to obtain the estimate. For this purpose, the Markov Chain Monte Carlo (MCMC) method is used. The Metropolis Hastings algorithm [29] is a modification version of the MCMC technique and can be used for this purpose.
The posterior distribution of θ can be written as
π ( θ ) θ n 1 i = 1 n ( θ + ( 2 θ ) x i ) 2 .
The following algorithm uses Metropolis Hastings steps with a normal proposal for updating the parameter θ and then obtains the Bayesian estimate of θ .
  • Step 1: Start with an arbitrary initial value θ ( 0 ) where g ( θ ( 0 ) | x ) > 0 and set k = 1 .
  • Step 2: Generate a proposal θ * from normal distribution, i.e., q ( θ ) = N ( θ ( k 1 ) , v a r ( θ ( k 1 ) ) ) .
  • Step 3: Calculate the acceptance probability function
    ρ = M i n 1 , π ( θ * ) q ( θ ( k 1 ) ) π ( θ ( k 1 ) ) q ( θ * ) .
  • Step 4: Generate U u n i f o r m ( 0 , 1 ) .
  • Step 5: If U ρ put θ ( k ) = θ * ; otherwise put θ ( k ) = θ ( k 1 ) .
  • Step 6: Repeat steps (2) and (5) N times to have θ ( k ) , k = 1 , , N .
Using the simulated posterior sample, the Bayesian estimate of θ is given as: θ ^ = 1 N N 0 k = N 0 + 1 N θ ( k ) where N 0 represents the number of burn-in periods of Markov chain discarded to remove the effect of the selected initial value of θ . For more details, see [30].

3.3. Cramer-Von-Mises Method

In the CVME method the distance between the cumulative distribution function and the experimental distribution function is reduced, which can be summarized as follows
C M V ( θ ) = 1 12 n + i = 1 n F ( x ( i ) , θ ) 2 i 1 2 n 2 ,
and the first derivative with respect to θ is given by
C M V ( θ ) θ = 2 i = 1 n F ( x ( i ) , θ ) 2 i 1 2 n F θ ( x ( i ) , θ ) = 0 = 2 i = 1 n 1 θ ( 1 x ( i ) ) θ + ( 2 θ ) x ( i ) 2 i 1 2 n θ + ( 2 θ ) x ( i ) ( x ( i ) 1 ) 2 θ x ( i ) ( x ( i ) 1 ) ( 1 θ ) ( θ + ( 2 θ ) x ( i ) ) 2 = 0 .
θ ^ is the value of θ that minimizes Equation (17).

3.4. Least Squares Method

In this method, the sum of the offsets or residuals of points from the plotted curve is minimized which can be summarized as follows
L S ( θ ) = i = 1 n F ( x ( i ) , θ ) i n + 1 2 ,
and the first derivative with respect to θ is given by
L S ( θ ) θ = 2 i = 1 n F ( x ( i ) , θ ) i n + 1 F θ ( x ( i ) , θ ) = 0 = 2 i = 1 n 1 θ ( 1 x ( i ) ) θ + ( 2 θ ) x ( i ) i n + 1 θ + ( 2 θ ) x ( i ) ( x ( i ) 1 ) 2 θ x ( i ) ( x ( i ) 1 ) ( 1 θ ) ( θ + ( 2 θ ) x ( i ) ) 2 = 0 .
θ ^ is the value of θ that minimizes Equation (19).

3.5. Method of Moments

This method can be obtained by equating the population moments with the sample moments as follows
i = 1 n x i n = μ 1 = 1 θ 2 F 1 2 , 2 ; 3 ; θ 2 θ .

3.6. Weighted Least Squares Method

This method is assumed to be a generalization to the LSE method and is given as follows
W L S ( θ ) = i = 1 n ( n + 2 ) ( n + 1 ) 2 i ( n i + 1 ) F ( x ( i ) , θ ) i n + 1 2 ,
and the first derivative with respect to θ is given by
W L S ( θ ) θ = 2 i = 1 n ( n + 2 ) ( n + 1 ) 2 i ( n i + 1 ) F ( x ( i ) , θ ) i n + 1 F θ ( x ( i ) , θ ) = 0 = 2 i = 1 n ( n + 2 ) ( n + 1 ) 2 i ( n i + 1 ) 1 θ ( 1 x ( i ) ) θ + ( 2 θ ) x ( i ) i n + 1 θ + ( 2 θ ) x ( i ) ( x ( i ) 1 ) 2 θ x ( i ) ( x ( i ) 1 ) ( 1 θ ) ( θ + ( 2 θ ) x ( i ) ) 2 = 0 .
θ ^ is the value of θ that minimizes Equation (22).
Equations (13), (15), (17), (19), (20) and (22) have no analytic closed form when equating by zero, so numerical methods are used to give solutions.

4. Simulation Study

In this section, a simulation study is performed to show the effectiveness of the previous estimation methods of θ ^ . All observations follow UHLG distribution. In this study, some statistics of θ ^ including mean estimated (ME), average bias (AB) and mean squared error (MSE) using previous estimation methods are calculated. Different values of θ are assumed here and the study is performed 2000 times at samples sizes 20, 70, 100, 150 and 200.
The statistical measurements ME, AB and MSE can be given, respectively, as follows
ME = 1 2000 i = 1 2000 θ i ^ , AB = 1 2000 i = 1 2000 ( θ θ i ^ ) , and MSE = 1 2000 i = 1 2000 ( θ θ i ^ ) 2 .
Table 1 and Table 2 show some statistics of θ ^ including ME, AB and MSE using the previous estimation methods.
Table 1 and Table 2 show the following notes in general:
  • ME converges to θ when the sample size, n, increases;
  • AB tends to zero when the sample size, n, increases;
  • MSE decreases when the sample size, n, increases;
  • In general, MLE and BE methods are the best estimation methods compared with the previous methods.

5. Unit Half Logistic-Geometry Quantile Regression Model

In this section, a new regression model for bounded unit intervals is introduced as an alternative to some other regression models such as log Bilal, beta and Kumarswammy regression models.
Consider the re-parameterization,
θ = 2 μ ( τ 1 ) τ ( μ 1 ) ,
where μ = Q ( τ | θ ) .
Substituting from Equation (23) into Equations (3) and (5), we obtain
F ( x ) = 1 2 μ ( τ 1 ) τ ( μ 1 ) ( 1 x ) 2 μ ( τ 1 ) τ ( μ 1 ) + 2 2 μ ( τ 1 ) τ ( μ 1 ) x ,
and
f ( x ) = 4 μ ( τ 1 ) τ ( μ 1 ) 2 μ ( τ 1 ) τ ( μ 1 ) + 2 2 μ ( τ 1 ) τ ( μ 1 ) x 2 ,
where 0 < x < 1 and 0 < μ < 1 .
Let X i , i = 1 , 2 , , n be n independent random variables such that X i U H L G ( μ i ; τ ) . The UHLG quantile regression is given as
g ( μ i ) = δ T t i ,
where t i = ( 1 , t 1 i , t 2 i , , t p i ) is the vector of covariates and δ = ( δ 0 , δ 1 , , δ p ) T is the regression coefficients vector.
The logit link function used to link the covariates to the mean of response variable can be given as follows
g ( μ i ) = log μ i 1 μ i .
From Equations (26) and (27), we have
μ i = e δ T t i 1 + e δ T t i , i = 1 , 2 , , n .

5.1. Maximum Likelihood Estimates Method

The unknown parameter δ = ( δ 0 , δ 1 , , δ p ) T is estimated under the classical approach MLE method, expressed as
l ( δ ) = n log [ 4 μ ] + n log [ τ 1 ] n log [ τ ] n log [ μ 1 ] 2 i = 1 n log 2 μ ( τ 1 ) τ ( μ 1 ) + 2 2 μ ( τ 1 ) τ ( μ 1 ) x i ,
where δ is the vector of unknown parameters. By maximization of l given in Equation (29), we obtain δ ^ , the M L E s of δ . Maximization can be obtained with the R program using the functions (optim and Maxlik), see [31,32].

5.2. Residual Analysis

To check the suitability of the regression model, a residual analysis is needed. To do that, Cox–Snell, e i ^ [33] and the randomized quantile residuals, r i ^ [34], are given, respectively, as follows
e i ^ = log [ G ¯ ( x i , μ i , θ ) ] ,
r i ^ = Φ 1 [ G ( x i , μ i , θ ) ] ,
where G ¯ ( x i , μ i , θ ) = 1 G ( x i , μ i , θ ) is the survival function of the UHLG regression model, and Φ 1 [ . ] is the inverse cumulative function of the standard normal distribution.

6. Application

In this section, a real data set is used to show the ability of the UHLG distribution in modelling bounded data sets. It is compared with other unit distributions. These distributions are log Bilal regression model (LB), beta regression model (B) and Kumaraswamy regression model (K). The pdfs of these models are given, respectively, as follows
f L B ( x ) = 3 τ ( μ 1 ) μ ( τ 1 ) x τ ( μ 1 ) μ ( τ 1 ) 1 1 x τ ( μ 1 ) 2 μ ( τ 1 ) ,
f B ( x ) = Γ ( α ) Γ ( α μ ) Γ ( α ( 1 μ ) ) x α μ 1 ( 1 x ) ( 1 μ ) α 1 ,
and
f K ( x ) = α log [ 0.5 ] log [ 1 μ α ] x α 1 ( 1 x α ) log [ 0.5 ] log [ 1 μ α ] 1 .
where α > 0 , x ( 0 , 1 ) , and μ ( 0 , 1 ) represent the mean in Equation (31) and the median in Equations (30) and (32).
To show the differences between these models and the UHLG regression model, some statistics such as Akaike information criterion (AIC), corrected Akaike information criterion (AICC), Bayesian information criterion (BIC), Kolmogorov–Smirnov (K-S) and p-value are calculated.

Risk Survey Data

Insurance can be defined as a contract, represented by a policy in which those insured by an insurance company receive protection against potential losses. The company aggregates the risk of the largest number of customers to make payments more at discounted rates for the insured. Insurance policies are used to protect against the risk of financial losses, whether large or small, which may result from damage to the insured or what he owns, or from civil liability for damage to another party. Some of the most prevalent types of insurance are life, death and property insurance.
Risk management is an important and necessary aspect of insurance. Risk surveys are an effective way to identify, quantify and therefore manage risk by collecting information, perceptions and insights from managers across an organization.
The data set represents a questionnaire sent to 374 risk managers in large U.S.-based organizations. Seventy-three of the managers returned the completed survey. The data were used before by Mazucheli et al. [13]. Four important topics were solicited including captive insurance, decision making, organizational data and evaluating and identifying exposures. The data were described as follows:
  • Firm cost (y) is the mean variable and represents the cost of the firm’s cost management effectiveness;
  • Assume ( x 1 ) represents the firm’s retention strategy;
  • Cap ( x 2 ) represents the indicator with value 1 if the firm uses a captive insurer and the value 0 otherwise;
  • Sizelog ( x 3 ) represents the log of firm’s size;
  • Indcost ( x 4 ) represents the risk in the firm’s industry;
  • Central ( x 5 ) represents the strategy of the firm’s centralization;
  • Analy ( x 6 ) represents the degree of importance of using analytical tools.
First, a univariate regression model was used to model the risk survey data to test the goodness of fit of UHLG distribution over some other distributions such as log Bilal, beta and Kumaraswamy distributions with pdfs given, respectively, as
f 1 ( x ) = 6 θ x 2 θ 1 ( 1 x 1 θ ) ,
f 2 ( x ) = Γ ( θ + α ) Γ ( α ) Γ ( α ) x θ 1 ( 1 x ) α 1 ,
and
f 3 ( x ) = θ α x θ 1 ( 1 x θ ) α ,
where θ , α > 0 , x ( 0 , 1 ) .
A comparison of ML estimates between some various unit distributions and some statistics for the previous data is given in Table 3 and it is obvious that unit half logistic gives the best fit to data.
Now, a multivariate regression model is used to show the impact of assume, cap, sizelog, indcost, central and analy components on the firm cost component.
The logit link function for μ i is assumed for all fitted regression models as it ensures that the estimated mean lies between 0 and 1 as follows
l o g i t ( μ i ) = δ 0 + δ 1 x i 1 + δ 2 x i 2 + δ 3 x i 3 + δ 4 x i 4 + δ 5 x i 5 + δ 6 x i 6 , i = 1 , , 73 .
Table 4 and Table 5 give the results of fitting data. The ML estimates of parameters θ , α and δ i , i = 0 , 1 , , 6 are listed in these tables. In addition, the corresponding standard error (SE) and p-value are also given. Moreover, AIC and BIC statistics are given for each regression model.
Based on Table 4 and Table 5, we can notice the following:
  • All covariates have an impact on the firm’s cost management effectiveness;
  • The UHLG regression model explains the greatest difference by using fewer parameters (-AIC = 192.34 and -BIC = 176.31);
  • UHLG regression model gives the best fit to the data compared to the other models.
Figure 7 shows the PP plots of the theoretical and empirical probabilities of the Cox–Snell residuals for different regression models fitting the risk survey data.

7. Conclusions

A new unit distribution is proposed to deal with data lying in the unit interval (0, 1). This distribution is called unit half-logistic geometric distribution with some flexible statistical properties. The new distribution is assumed to be alternative to some other distributions including beta, Kumaraswamy and log Bilal distributions. Some important statistical properties like moments, mean inactivity, mean residual, stress strength, stochastic ordering and other properties are given. In addition, different estimation methods are used estimating the parameter. Moreover, a new quantile regression model is introduced using UHLG distribution. Finally, an application on a real data set is performed to clarify the usefulness of this distribution and its regression model. The data come from a questionnaire sent to some large organizations in the united states. The p-value of the UHLG distribution was the biggest among other distributions. Moreover, the UHLG regression model explained the greatest difference by using fewer parameters.
Modelling bounded data sets lying in the (0, 1) interval became very important recently. Therefore, we are in desperate need of new unit distributions modelling such data. In the future, more unit distributions are needed to give the best fit of data from medical, actuarial and finance science fields.

Author Contributions

Conceptualization, A.T.R. and A.H.T.; methodology, A.T.R., A.H.T. and B.S.E.-D.; software, A.T.R.; validation, A.H.T.; formal analysis, A.T.R., A.H.T.and B.S.E.-D.; investigation, A.T.R. and A.H.T.; resources, A.T.R., A.H.T. and B.S.E.-D.; data curation, A.T.R. and A.H.T.; writing—original draft preparation, A.H.T., B.S.E.-D. and A.T.R.; writing—review and editing, A.H.T., A.T.R. and B.S.E.-D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available in this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ramadan, A.T.; Tolba, A.H.; El-Desouky, B.S. Generalized power Akshaya distribution and its applications. Open J. Model. Simul. 2021, 4, 323–338. [Google Scholar] [CrossRef]
  2. El-Sagheer, R.M.; Tolba, A.H.; Jawa, T.M.; Sayed-Ahmed, N. Inferences for Stress-Strength Reliability Model in the Presence of Partially Accelerated Life Test to Its Strength Variable. Comput. Intell. Neurosci. 2022, 5, 4710536. [Google Scholar] [CrossRef] [PubMed]
  3. Abd El-Monsef, M.M.E.; El-Awady, M.M.; Seyam, M.M. A new quantile regression model for modelling child mortality. Int. J. Biomath. 2022, 10, 142–149. [Google Scholar]
  4. Altun, E.; Cordeiro, G.M. The unit-improved second-degree Lindley distribution: Inference and regression modelling. Comput. Stat. 2020, 35, 259–279. [Google Scholar] [CrossRef]
  5. Altun, E.; El-Morshedy, M.; Eliwa, M.S. A new regression model for bounded response variable: An alternative to the beta and unit-Lindley regression models. PLoS ONE 2021, 16, e0245627. [Google Scholar] [CrossRef] [PubMed]
  6. Bayes, C.L.; Bazán, J.L.; De Castro, M. A quantile parametric mixed regression model for bounded response variables. Stat. Its Interface 2017, 10, 483–493. [Google Scholar] [CrossRef]
  7. Cordeiro, G.M.; Silva, R.B.; Nascimento, A.D.C. Recent Advances in Lifetime and Reliability Models; Bentham Science Publishers: Oak Park, IL, USA, 2020; Volume 17, pp. 93–112. [Google Scholar]
  8. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  9. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  10. Mitnik, P.A.; Baek, S. The Kumaraswamy distribution: Median-dispersion re-parameterizations for regression modelling and simulation-based estimation. Stat. Pap. 2013, 54, 177–192. [Google Scholar] [CrossRef]
  11. Gómez-Déniz, E.; Sordo, M.A.; Calderín-Ojeda, E. The Log-Lindley distribution as an alternative to the beta regression model with applications in insurance. Insur. Math. Econ. 2014, 54, 49–57. [Google Scholar] [CrossRef]
  12. Korkmaz, M.Ç.; Chesneau, C. On the unit Burr-XII distribution with the quantile regression modelling and applications. Comput. Appl. Math. 2021, 40, 1–26. [Google Scholar] [CrossRef]
  13. Mazucheli, J.; Menezes, A.F.B.; Ghitany, M.E. The unit-Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
  14. Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; De Oliveira, R.P.; Ghitany, M.E. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modelling of quantiles conditional on covariates. J. Appl. Stat. 2020, 47, 954–974. [Google Scholar] [CrossRef] [PubMed]
  15. Muse, A.H.; Tolba, A.H.; Fayad, E.; Ali, O.A.A.; Nagy, M.; Yusuf, M. Modelling the COVID-19 mortality rate with a new versatile modification of the log-logistic distribution. Comput. Intell. Neurosci. 2021, 26, 203–224. [Google Scholar] [CrossRef] [PubMed]
  16. Mahmood, Z.; Jawa, M.T.; Sayed-Ahmed, N.; Khalil, E.M.; Muse, A.H.; Tolba, A.H. An Extended Cosine Generalized Family of Distributions for Reliability Modelling: Characteristics and Applications with Simulation Study. Math. Probl. Eng. 2022, 3, 112–128. [Google Scholar]
  17. Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. J. Appl. Stat. 2019, 46, 700–714. [Google Scholar] [CrossRef] [Green Version]
  18. Mousa, A.M.; El-Sheikh, A.A.; Abdel-Fattah, M.A. A gamma regression for bounded continuous variables. Adv. Appl. Stat. 2016, 49, 305–321. [Google Scholar] [CrossRef]
  19. Sarhan, A.M.; El-Gohary, A.I.; Mustafa, A.; Tolba, A.H. Statistical analysis of regression competing risks model with covariates using Weibull sub-distributions. Int. J. Reliab. Appl. 2019, 2, 73–88. [Google Scholar]
  20. Sarhan, A.M.; El-Gohary, A.I.; Tolba, A.H. Statistical Analysis of a Competing Risks Model with Weibull Sub-Distributions. Appl. Math. 2017, 11, 1671–1690. [Google Scholar] [CrossRef] [Green Version]
  21. Tadikamalla, P.R. On a family of distributions obtained by the transformation of the gamma distribution. J. Stat. Comput. Simul. 1981, 13, 209–214. [Google Scholar] [CrossRef]
  22. Liu, K.; Balakrishnan, N. Recurrence relations for moments of order statistics from half logistic-geometric distribution and their applications. Commun. Stat.-Simul. Comput. 2020, 17, 1–19. [Google Scholar] [CrossRef]
  23. Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products; Academic Press: Cambridge, MA, USA, 2014; Volume 11. [Google Scholar]
  24. Mohamed, R.A.H.; Tolba, A.H.; Almetwally, E.M.; Ramadan, D.A. Inference of Reliability Analysis for Type II Half Logistic Weibull Distribution with Application of Bladder Cancer. Axioms 2022, 8, 386. Available online: https://www.mdpi.com/2075-1680/11/8/386 (accessed on 6 August 2022). [CrossRef]
  25. Shanmugam, R. The Stress-Strength Model and Its Generalizations: Theory and Applications; Taylor & Francis: Boca Raton, FL, USA, 2004; Volume 16, pp. 84–105. [Google Scholar]
  26. Frank, W.J.O. NIST Handbook of Mathematical Functions; Cambridge University Press: Cambridge, UK, 2010; Volume 17, pp. 224–251. [Google Scholar]
  27. Balakrishnan, N. Order statistics from the half logistic distribution. J. Stat. Comput. Simul. 1985, 20, 287–309. [Google Scholar] [CrossRef]
  28. Belzunce, F.; Riquelme, C.M.; Mulero, J. An Introduction to Stochastic Orders; Academic Press: Cambridge, MA, USA, 2015; Volume 12, pp. 165–187. [Google Scholar]
  29. Abushal, T.A.; Kumar, J.; Muse, A.H.; Tolba, A.H. Estimation for Akshaya Failure Model with Competing Risks under Progressive Censoring Scheme with Analyzing of Thymic Lymphoma of Mice Application. Complexity 2022, 2022, 5151274. [Google Scholar] [CrossRef]
  30. McCool, J.I. Using the Weibull Distribution: Reliability, Modelling and Inference; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 950, pp. 43–66. [Google Scholar]
  31. Zhang, P.; Qiu, Z.; Shi, C. Simplexreg: An R package for regression analysis of proportional data using the simplex distribution. J. Stat. Softw. 2016, 71, 51–76. [Google Scholar] [CrossRef] [Green Version]
  32. Nurunnabi, A.A.M.; Hadi, A.S.; Imon, A.H.M.R. Procedures for the identification of multiple influential observations in linear regression. J. Appl. Stat. 2014, 41, 1315–1331. [Google Scholar] [CrossRef]
  33. Cox, D.R.; Snell, E.J. A general definition of residuals. J. R. Stat. Soc. Ser. B Methodol. 1968, 30, 248–265. [Google Scholar] [CrossRef]
  34. Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 121–143. [Google Scholar]
Figure 1. cdf of UHLG distribution at different values of θ .
Figure 1. cdf of UHLG distribution at different values of θ .
Axioms 11 00676 g001
Figure 2. pdf of UHLG distribution at different values of θ .
Figure 2. pdf of UHLG distribution at different values of θ .
Axioms 11 00676 g002
Figure 3. Survival function of UHLG distribution at different values of θ .
Figure 3. Survival function of UHLG distribution at different values of θ .
Axioms 11 00676 g003
Figure 4. Hazard rate function of UHLG distribution at different values of θ .
Figure 4. Hazard rate function of UHLG distribution at different values of θ .
Axioms 11 00676 g004
Figure 5. The mean and the variance plots for UHLG distribution.
Figure 5. The mean and the variance plots for UHLG distribution.
Axioms 11 00676 g005
Figure 6. The skewness and the kurtosis plots for UHLG distribution.
Figure 6. The skewness and the kurtosis plots for UHLG distribution.
Axioms 11 00676 g006
Figure 7. PP plots of the theoretical and empirical probabilities of the Cox–Snell residuals fitting the risk survey data.
Figure 7. PP plots of the theoretical and empirical probabilities of the Cox–Snell residuals fitting the risk survey data.
Axioms 11 00676 g007
Table 1. Some statistics of θ ^ including ME, AB and MSE using MLE, BE and CVME methods.
Table 1. Some statistics of θ ^ including ME, AB and MSE using MLE, BE and CVME methods.
θ n MLE θ ^ BE θ ^ CVMEs
MEABMSEMEABMSEMEABMSE
0.1 200.1150.0150.0020.1090.0090.0040.1480.0490.058
700.1030.0030.0010.1060.0060.0010.1500.0500.048
1000.1020.0020.0010.1010.0010.0000.1470.0470.046
1500.1010.0010.0010.1020.0020.0000.1620.0620.042
2000.1010.0010.0010.1030.0030.0000.1500.0500.036
0.5 200.5730.0730.0520.5390.0390.0650.5350.0350.075
700.5160.0160.0110.5320.0320.0200.5490.0490.065
1000.5090.0090.0080.5070.0070.0090.5250.0250.051
1500.5030.0030.0050.5090.0090.0070.5310.0310.044
2000.5040.0040.0040.5150.0150.0060.5430.0430.034
0.9 201.0310.1310.1680.9670.0670.2150.9110.0110.096
700.9290.0290.0370.9510.0510.0670.9320.0320.087
1000.9160.0160.0260.9110.0110.0350.9270.0270.074
1500.9050.0050.0160.9140.0140.0270.9420.0420.062
2000.9070.0070.0130.9240.0240.0190.9220.0220.055
Table 2. Some statistics of θ ^ including ME, AB and MSE using LSE, MME and WLSE methods.
Table 2. Some statistics of θ ^ including ME, AB and MSE using LSE, MME and WLSE methods.
θ nLSEsMMEsWLSEs
MEABMSEMEABMSEMEABMSE
0.1 200.1490.0490.1070.2940.1940.3520.1440.0440.116
700.1550.0550.0880.2950.1950.3270.1460.0460.089
1000.1630.0630.0760.2560.1560.2970.1610.0610.089
1500.1510.0510.0690.2450.1450.2370.1610.0610.066
2000.1500.0500.0530.2710.1710.2110.1610.0610.042
0.5 200.5240.0240.0740.9790.4791.7240.6250.1250.312
700.5380.0380.0670.8900.3901.7020.6160.1160.302
1000.5370.0370.0610.9380.4381.6640.6250.1250.291
1500.5400.0400.0590.8590.3591.5680.6210.1210.285
2000.5420.0420.0560.9150.4151.4970.6140.1140.264
0.9 200.9090.0090.0831.2250.3252.0491.0650.1650.608
700.9300.0300.0821.1310.2312.0491.0840.1840.582
1000.9200.0200.0791.1640.2641.9621.0640.1640.571
1500.9350.0350.0741.0980.1981.9241.0800.1800.554
2000.9000.0000.0631.0850.1851.8521.0840.1840.521
Table 3. ML estimates and some statistics for the risk survey data.
Table 3. ML estimates and some statistics for the risk survey data.
Model θ ^ α ^ AICAICCBICK-Sp-Value
unit half logistic0.132-−177.02−177.01−174.780.11910.2515
log Bilal3.464-−149.388−149.332−147.0980.22410.0013
beta0.6133.799−148.24−148.06−143.650.18050.0172
Kumaraswamy7.3502.300−150.01−149.84−144.590.95860.0000
Table 4. ML estimates for the regression model parameters with some other statistics fitting risk survey data (comparison between UHLG, Beta and Kumaraswamy regression models).
Table 4. ML estimates for the regression model parameters with some other statistics fitting risk survey data (comparison between UHLG, Beta and Kumaraswamy regression models).
coeffs.UHLGBetaKumaraswamy
Est.SEp-ValueEst.SEp-ValueEst.SEp-Value
δ 0 4.1281.438<0.00001.8880.944<0.0000−1.8662.55<0.0000
δ 1 −0.0120.149<0.0000−0.0120.120<0.00000.4290.447<0.0000
δ 2 0.0180.635<0.00000.1780.472<0.00000.0261.174<0.0000
δ 3 −0.9180.456<0.0000−0.5110.334<0.0000−0.0900.788<0.0000
δ 4 2.1450.953<0.00001.2360.513<0.0000−1.0281.711<0.0000
δ 5 −0.0920.389<0.0000−0.0120.204<0.00000.0880.722<0.0000
δ 6 0.0050.189<0.0000−0.0040.085<0.0000−0.0560.356<0.0000
α ---6.330.436<0.00000.2410.204<0.0000
AIC−192.34 −159.4 −190.1
BIC−176.31 −141.1 −171.8
Table 5. ML estimates for the regression model parameters with some other statistics fitting risk survey data (comparison between UHLG, Unit Weibull and Unit Omega regression models).
Table 5. ML estimates for the regression model parameters with some other statistics fitting risk survey data (comparison between UHLG, Unit Weibull and Unit Omega regression models).
coeffs.UHLGlog Bilal
Est.SEp-ValueEst.SEp-Value
δ 0 4.1281.438<0.0000−1.7040.963<0.0000
δ 1 −0.0120.149<0.00000.0050.011<0.0000
δ 2 0.0180.635<0.0000−0.0610.189<0.0000
δ 3 −0.9180.456<0.00000.2980.100<0.0000
δ 4 2.1450.953<0.0000−0.7270.400<0.0000
δ 5 −0.0920.389<0.00000.0200.070<0.0000
δ 6 0.0050.189<0.0000−0.0010.017<0.0000
AIC−192.34 −151.46
BIC−176.31 −135.42
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ramadan, A.T.; Tolba, A.H.; El-Desouky, B.S. A Unit Half-Logistic Geometric Distribution and Its Application in Insurance. Axioms 2022, 11, 676. https://doi.org/10.3390/axioms11120676

AMA Style

Ramadan AT, Tolba AH, El-Desouky BS. A Unit Half-Logistic Geometric Distribution and Its Application in Insurance. Axioms. 2022; 11(12):676. https://doi.org/10.3390/axioms11120676

Chicago/Turabian Style

Ramadan, Ahmed T., Ahlam H. Tolba, and Beih S. El-Desouky. 2022. "A Unit Half-Logistic Geometric Distribution and Its Application in Insurance" Axioms 11, no. 12: 676. https://doi.org/10.3390/axioms11120676

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop