The Log Exponential-Power Distribution: Properties, Estimations and Quantile Regression Model

: Recently, bounded distributions have attracted attention. These distributions are frequently used in modeling rate and proportion data sets. In this study, a new alternative model is proposed for modeling bounded data sets. Parameter estimations of the proposed distribution are obtained via maximum likelihood method. In addition, a new regression model is deﬁned under the proposed distribution and its residual analysis is examined. As a result of the empirical studies on real data sets, it is observed that the proposed regression model gives better results than the unit-Weibull and Kumaraswamy regression models. near to each other. The simulation results verify the unbiasedness of the SEs.


Introduction
The exponential-power (EP) distribution has been proposed by [1]. If T random variable has EP distribution then its cumulative distribution function (cdf) and probability density function (pdf) are respectively given by K(t, α, β) = 1 − e 1−exp{αt β } (1) and where t ≥ 0, α > 0 is the scale parameter and β > 0 is the shape parameter. This distribution is well-known in the literature because of its hazard rate function (hrf) with bathtub shaped. It can be noticed that for β ∈ (0, 1), it has hrf with bathtub shaped. Characterizations of the EP distribution was studied by [2]. Several generalizations of the EP distributions have been proposed. For instance, [3] introduced the transmuted-EP (TEP) distribution with applications on the holes operation on jobs made of iron sheet and breaking stress data sets. [4] introduced the generalization of the complementary EP distribution and compared it with other transmuted generalized distributions. [5] introduced the four-parameter generalized EP distribution which has bimodal right-skewed and nearly symmetric pdf shapes.
In the recent years, many lifetime distributions are tranformed to bounded distributions, defined on (0, 1) interval. Two transformation methods are commonly prefered for this goal. These are X = exp(−T) and X = T/(T + 1) transformations. When the random variable is defined on T ∈ R, the transformed random variable will be T ∈ (0, 1).
The presented work aims to develop a new sophisticated statistical model for bounded data sets in (0, 1) interval. For this aim, we use X = exp(−T) transformation on the EP distribution. The resulting cdf and pdf of the log-exponential-power (LEP) distribution are given as and respectively, where α > 0 and β > 0 are the model parameters. This new unit model is called as LEP distribution and after here, a random variable X is denoted as X ∼ LEP(α, β).
The associated hrf is given by If the β parameter is equal to one, then we have following simple cdf and pdf F(x, α, 1) = e 1−x −α and f (x, α, 1) = αx −α−1 e 1−x −α for x ∈ (0, 1) respectively. The possible shapes of the pdf and hrf have been sketched by Figure 1. According to this Figure 1, the shapes of the pdf can be seen as various shapes such as U-shaped, increasing, decreasing and unimodal as well as its hrf shapes can be bathtub, increasing and N-shaped. Other parts of the study are as follows. Statistical properties of the LEP distribution are given in Section 2. Parameter estimation method is presented in Section 3. Section 4 is devoted to the LEP quantile regression model. Section 5 contains two simulation studies for LEP distribution and the LEP quantile regression model. Empirical results of the study are given in Section 6. The study is concluded with Section 7.

Some Distributional Properties of the LEP Distribution
The moments, order statistics, entropy and quantile function of the LEP distribution are studied.

Moments
The n-th non-central moment of the LEP distribution is denoted by E(X n ) which is defined as Based on the first four non-central moments of the LEP distribution, we calculate the skewness and kurtosis values of the LEP distributions. These measures are plotted in Figure 2 against the parameters α and β.

Order Statistics
The cdf of i-th order statistics of the LEP distribution is given by By changing − log(x) = u transform we obtain

Residual Entropy and Cumulative Residual Entropy
Entropy is used to measure uncertainty in different fields such as engineering and natural sciences. The definition of the residual entropy is given by The another entropy measure, cumulative residual entropy is defined by After some simple algebra, using u = − log(x) transformation and Taylor expansion for LEP(α, β) we have and

Procedure of the Maximum Likelihood for the Parameter Estimation
The maximum likelihood estimators (MLEs) of the LEP distribution has been derived. It has been worked on the case when both α and β are unknown. Let x 1 , x 2 , . . . , x n random sample of size n from the LEP distribution and let Θ = (α, β) T be the parameter vector. Then, the log-likelihood function is given by Then, differentiating (13), the normal equations are obtained by Above equation systems have no explicit solutions. To obtain theα andβ, the numerical approaching should be needed and they have to be solved via numerical methods. The Newton-Raphson and quasi-Newton algorithms can be used for this purpose. However, Equation (13) can be also optimized directly by the special functions in some well-known software such as R (constrOptim and optim and maxLik functions), S-Plus and Matlab. These functions use the numerical optimization methods for solving them. When the log-likelihood is directly optimized, one should carefully choose the initial values and remove the constraints of parameters [20].
The observed information matrix plays an important role in order to obtain the standard errors and asymptotic confidence interval of the MLEs. Based on the regularity conditions, theΘ MLEs have approximately the bivariate normal distribution with mean M = (α, β) and covariance matrix I −1 , where I is the observed information matrix with the following elements .
The components of the I can be given by the following derivations The standard errors of the estimated parameters are obtained by the inverse of I, say I −1 . So, the asymptotic confidence intervals of the parameters can be easily constructed by where s α = I −1 11 , s β = I −1 22 and z ζ/2 is the upper ζ/2 th percentile of N(0, 1) distribution.

The New Quantile Regression Model Based on the QLEP Distribution for the Unit Response
The regression models are used to explain the relation between dependent and independent variables. When the dependent variable is on (0, 1) interval, the beta regression model by [21] is usually used. The idea of the beta regression model is based on the re-parameterization of the mean and variance functions. Similar idea has been used by several authors such as [6,8,17,22,23]. These studies are constructed on the conditional mean modeling.
The other possible idea is to model the dependent variable with conditional quantile function as in the quantile regression model pioneered by [24]. Ref. [25] introduced the Kumaraswamy quantile regression model for the dependent variables on the (0, 1) interval. The presented idea in the study of [25] has been applied to different probability distributions by [9,11,12,14,[26][27][28][29].
Following the idea of [25], we propose QLEP quantile regression model. Let Y 1 , Y 2 , . . . , Y n be n random variables from the QLEP distribution denoted by Y ∼ QLEP(β, µ i , τ) where µ i and β unknown parameters and τ is known. Assume that y 1 , y 2 , . . . , y n are the realizations of the random variables, Y 1 , Y 2 , . . . , Y n . Then, the QLEP regression model is given as follows where β = β 0 , β 1 , β 2 , . . . , β p T is the unknown regression parameter vector, . , x ip is the known i th vector of the covariates, and g(·) is the link function which connects covariates with conditional quantile of the response variable. The logit link function is preferred by the reason that the response variable is on (0, 1) interval. The definition of the logit link function is If the parameter τ = 0.5, we model the median of the dependent variable with known independent variables which is called as QLEP median regression model.

The MLEs of the Model Parameters
is the inverse of the logit link function and τ is the known. The Φ = β, β T T be unknown parameter vector with p + 2. Then, putting (14) in (8), the log-likelihood function of the QLEP regression model is obtained as The score equations of the Equation (15) are given by and where . . , n, r = 1, . . . , p. Above score function consist of the nonlinear function according to model parameters. Therefore, they should be solved by the numerical methods. One may find these solutions via direct maximization of the (15) using software such as R software. The asymptotic distribution of the (Φ − Φ) is multivariate normal N p+2 0, I −1 (Φ) , where I −1 (Φ) is the expected information matrix. The (p + 2 × p + 2) observed information matrix can be used for the I −1 Φ .

Model Validity for the Fitting
The model validity is examined via the residual analysis. The randomized quantile residual (rqr) was proposed by [30]. The i th rqr is calculated bŷ where G(y, β, µ, τ) is the cdf of the QLEP distribution given by (7), Φ −1 (x) is the qf of N(0, 1), andμ i is defined by (14). If the model has good explanatory power, the the rqrs are distributed as the N(0, 1) model. The second residual is Cox-Snell residual [31] which is calculated byê As in rqr, the model is valid, once theê i follows the standard exponential model.

Simulation Studies
In this section, we discuss on the simulation studies to see the efficiency of the MLEs of the LEP distribution under different simulation scenarios. Also, we perform another simulation study for the proposed regression model to discuss the asymptotic behaviours of the MLEs of the QLEP quantile regression model. The R software is used to implement the simulation studies.

Simulation Results for the MLEs of the Proposed Distribution
We focus on the behaviours of the MLEs of the LEP distribution. For this aim, four simulation studies are conducted and the results of these simulation studies are summarized graphically. We have the following settings for the simulation studies. The N = 1000 samples of size n = 20, 25, . . . , 1000 from a random variable following the LEP distribution are generated. Four scenarios are considered. The (6) has been used for the random numbers. By setting = α or β, the related bias and mean square error (MSE) are calculated by respectively. In addition, for the behaviours of the 95% confidence intervals of the MLEs, we calculate the empirical coverage length (CL) and coverage probability (CP) which are defined by  Figure 4. The estimated CLs and CPs are also displayed in Figure 5. From these results, we conclude that the estimated biases and MSEs are near the zero. The estimated means approach the true values of each parameters. As expected, the CPs are near the desired value, 0.95 for all sample sizes. The CLs decrease when the sample sizes increase. In the light of these results, it is verified that the consistency property of the MLE is valid for the LEP distribution.

Scenario II
The true parameters are determined as α = 2 and β = 2 for the scenario-II and the results are summarized in Figures 6 and 7. Since the results of the scenario-II is the same with scenario-I, the interpretation of the simulation results are omitted. These results also verify the suitability of the MLE method for the LEP distribution.

Scenario III
The true parameter values for the scenario-III is determined as α = 0.5 and β = 2. The simulation results are displayed in Figures 8 and 9. The results are similar with the results of the scenario-I. Therefore, the interpretation of the results are omitted. As in previous simulation studies, these results verify the suitability of the MLE method for the proposed distribution.

Scenario IV
For the last scenario, the true values of the parameters are determined as α = 2 and β = 0.5. Figures 10 and 11 give the results of the simulation study graphically. General result of these simulation studies is that the MLE method works well to estimate the unknown parameter of the LEP distribution.

Comparison of SD and SE
Here, we compare the average of the standard errors (SEs) and standard deviations (SDs) of the estimated parameters to evaluate the unbiasedness of the SEs. For this aim, the SDs and SEs are calculated for four different scenarios and the results are graphically summarized in Figures 12-15. If the SEs are unbiased, we expect to see that SDs and SEs should be near to each other. As seen from Figures 12-15, the values of the SDs and SEs are near to each other. The simulation results verify the unbiasedness of the SEs.

Simulation Studies for the Proposed Regression Model
We give the simulation study in order to evaluate MLEs of the parameters of the QLEP regression model based on the bias and MSE calculations. For the varying sample size n, known τ, true β, and generated covariates values, the values of the unit response variable have been obtained with In this simulation study, it has been considered as replication number N = 1000, sample sizes n = 25, 50, 100, 250, 500, τ = 0.25, 0.50, 0.75, β = 5 with the following regression structure logit(µ i ) = β 0 + β 1 z i1 , i = 1, 2, . . . , n, where β 0 = 0.5, β 1 = 1 and z i1 ∼ N(0, 1). Table 1 shows the simulation results for the QLEP regression model. As seen from these results, the empirical CLs decrease while the sample size increases as well as the empirical CPs are around the 0.95 value. All biases are close zero value as wel as all MSEs tend to zero value at the same time.

Applications
The presented section aims to show the importance of the modeling ability of both LEP distribution and QLEP regression model, based on the real data sets. The Recenly, this data set has been analyzed by [32,33].
Well known unit distributions in the literature have been compared with our distribution under the MLE method. Their densities have been given as follows. • Beta distribution: and f Beta (x, α, β) = 0 for x ∈ (0, 1), where α > 0, β > 0, and B(α, β) is the standard beta function. • Kumaraswamy (Kw) distribution [34]: and f Kw (x, α, β) = 0 for x ∈ (0, 1), where α > 0 and β > 0. • Johnson S B distribution [35]: and f S B (x, α, β) = 0 for x ∈ (0, 1), where α ∈ R, β > 0, and the φ(x) is the pdf of the standard normal distribution. • Unit Birnbaum Saunders (UBS) distribution [36]: and f UBS (x, α, β) = 0 for x ∈ (0, 1), where α > 0 and β > 0. Table 2 shows the results of data analysis. The standard errors are in (·) and p-values of the KS tests are in [·] From this Table, we can say that the LEP distribution is the best model based on above comparison criteria.  Figure 16 displays the estimated pdfs and cdfs of all distributions according to results of the fitted models. The probability-probability (PP) Quantile-Quantile (QQ) plots of the fitted LEP distribution have been given by Figure 17. All these plots indicates the acceptable fitting of the LEP distribution for the data set. The profile log-likelihood (PLL) functions are plotted in Figure 18 for the parameters of the QLEP distribution. According to Figure 18, the estimated parameters are maximizers of the function in (13).

Quantile Regression Application
We use the Better Life Index (BLI) data set measured in the year of 2017, available at https://stats.oecd.org/index.aspx?DataSetCode=BLI (accessed on 7 June 2021), to demon-strate the practicality of the QLEP regression model. We compare the QLEP model with two quantile regression models: Kumaraswamy and unit-Weibull. Homicide rate, y i , is considered as dependent variable. The covariates are: employment rate x i1 , personal earnings x i2 and labour market insecurity x i3 . The goal of the presented application is to explain the variability of the homicide rate with these covariates.
The estimated parameters as well as model selection criteria of the quantile regression models for τ = 0.5 are given in Table 3. The calculated AIC and BIC values show that the QLEP regression model is better than the Kumaraswamy and unit-Weibull regression models for the considered data set since the proposed model has the lowest value of these statistics. According to estimated regression parameters, β 2 is found statistically significant at % 5 level for three regression models. It means that when the personal earnings increase, the homicide rate decreases. It is an expected outcome. Because, the countries having high earnings provide better life conditions and it decreases the homicide rate. It is widely documented that the income inequality leads to increased homicide rate [37]. Accuracy of the fitted regression models are evaluated by means of the rqrs. As mentioned before, when the fitted model is preferable, the rqrs should be distributed as N(0, 1). Figure 19 shows the QQ plots of the rqrs for all fitted regression models. From these figures, one can conclude that the QLEP model is more appropriate model than others because the plotted points for the LEP regression models is more closer the diagonal line than those of others. Also, Table 4 lists the KS test results to verify that whether the randomized quantile residuals are distributed as standard normal distribution. Obviously, all p-values are higher than 0.05. So, randomized quantile residuals are normally distributed for all regression models. However, the p-value of the QLEP regression model is higher than those of other models. It is also evidence for the superiority of the QLEP regression model over two other models.

Conclusions
In this study, a quantile regression model is defined under the proposed distribution. The parameter estimates of the proposed regression model are obtained by the maximum likelihood estimation method, and the efficiency of the estimation method is examined via simulation study. Homicide rate of the OECD countries are analyzed by the proposed approach as well as unit-Weibull and Kumaraswamy regression models. The residual analysis of the fitted regression models are performed with the randomized quantile residuals. Based on the residual analysis and model selection criteria, the proposed approach is selected as a best model among others. Additionally, the distance-based model selection criteria is used in the study. However, the likelihood-based model selection methods are more effective than the distance-based methods [38]. So, the comparison of the LEP model with existing bounded distributions can be performed under the likelihood-based model selection approaches. We plan this issue as a future work of the presented study. Moreover, we plan to develop an extension of the proposed model when the response variable is time-dependent such as monthly incidence ratio of the corona-virus disease or weekly car accident involving death ratio.