The Odds Exponential-Pareto IV Distribution: Regression Model and Application

This article introduces the odds exponential-Pareto IV distribution, which belongs to the odds family of distributions. We studied the statistical properties of this new distribution. The odds exponential-Pareto IV distribution provided decreasing, increasing, and upside-down hazard functions. We employed the maximum likelihood method to estimate the distribution parameters. The estimators performance was assessed by conducting simulation studies. A new log location-scale regression model based on the odds exponential-Pareto IV distribution was also introduced. Parameter estimates of the proposed model were obtained using both maximum likelihood and jackknife methods for right-censored data. Real data sets were analyzed under the odds exponential-Pareto IV distribution and log odds exponential-Pareto IV regression model to show their flexibility and potentiality.

This article used the transformed-transformer (T-X) family by Alzaatreh et al. [27] to introduce an odds exponential-Pareto IV distribution, in which the cumulative distribution function (CDF) is defined by where r(t) is the probability density function (PDF) of a random variable T ∈ [a, b], such that −∞ ≤ a < b ≤ ∞ and W(F(x)) is a function of any CDF, that takes different forms, see Alzaatreh et al. [27].
In this study, we consider the odds function form, W(F(x)) = F(x) 1−F(x) . That is, the CDF will be and we considered the exponential distribution for r(t) = λe −λt , t ≥ 0, and F(x) = 1 − x > 0, is the Pareto IV distribution with parameters (a, θ, α) in Equation (2). The resulting generated distribution will provide more flexibility in accommodating different types of the hazard function for the generated distribution. Also, this proposed distribution will be more suitable for modeling and fitting different real-life data Therefore, we now define the odds exponential-Pareto IV (OEPIV) distribution with CDF given by The PDF of OEPIV is g(x; λ, a, θ, α) = λα aθ exp(λ) x where λ > 0, α > 0 are the shape parameters, θ > 0 is the scale parameter, and a > 0 is the inequality parameter.
Recently, there has been a great deal of interest in the literature investigating the relationship between survival time and some other covariates, such as sex, weight, blood pressure, and many others. In a number of applications, different parametric regression models were used to estimate the effect of covariate variables on the survival time, including the log-location-scale regression model. The log-location-scale regression model is distinguished since it is commonly used in clinical trials and in many other fields of application. It is also widely used in engineering models where failure is accelerated by voltage, temperature, or other stress factors [28]. Several studies in the literature applied the log-location-scale regression model based on different distributions, such as the log-modified Weibull [29], the log-Weibull extended [30], the log-exponentiated Weibull [31], the log-Burr XII [32], the log-beta Weibull [33], the log-beta log-logistic [34], the log-Fréchet [35], the log-Exponentiated Fréchet [36], and the log-gamma-logistic [37]. Recent studies used the log-location-scale regression model built from the logarithm odd of the distribution. For instance, the odd log-logistic-Weibull [38], odd log-logistic generalized half normal [39], and odd Weibull [40].
This article is organized as follows: In Section 2, we define the survival and hazard functions of the OEPIV distribution with some graphical representations. We derived some of the OEPIV properties in Section 3. In Section 4, we explain the maximum likelihood estimation for parameters of the odds exponential-Pareto IV distribution. Simulation studies are provided to illustrate the performance of the OEPIV distribution in Section 5. In Section 6, we address the log odds exponential-Pareto IV (LOEPIV) distribution along with some of its statistical properties, in addition to introducing a log-location regression model based on LOEPIV and discussed its parameter estimates via maximum likelihood and Jackknife methods. In Section 7, three applications are analyzed to demonstrate the performance of the introduced new distribution and its regression model. Finally, we report our conclusions in Section 8.
Graphical representations of the PDF in Equation (4) and HF in Equation (6) are, respectively, shown in Figures 1 and 2. From Figure 1, we note that the OEPIV distribution has different shapes at different parameter values, which indicate its great flexibility. Based on Figure 2, the OEPIV takes the following HF shapes: increasing, decreasing, and upside-down.

Statistical Properties
We discuss in this section some statistical properties of the OEPIV distribution.

The Quantile and Median
The quantile of the OEPIV distribution is computed as Then, the median of the OEPIV distribution can be obtained by setting p = 0.5 in Equation (7),

The Mode
The mode of the OEPIV distribution can be obtained by computing the derivative of the log PDF in Equation (4) with respect to x and equating to zero Thus, the mode can be obtained numerically by solving Equation (9).

The r-th Order Moment and Moment Generating Function
The r-th order raw moment is defined as Thus, . Thus, we put the above formulas in the integration to have Using the binomial expansion of u λ 1/α − 1 ar , we obtain Using the gamma function definition, Thus, the r-th moment can be written as Therefore, the moment generating function (mgf) can be obtained based on r-th moment of OEPIV distribution as Substituting from Equation (10) into Equation (11), we find Then, the mean of the OEPIV distribution is The mean, variance, skewness, and kurtosis of the OEPIV distribution for different values of λ, a, θ, and α are calculated in Table 1, to illustrate the effects on these measures.

Order Statistics
Suppose X 1 , X 2 , X 3 , . . . , X n is a random sample from the PDF in Equation (4). Let X (1) , X (2) , X (3) , . . . , X (n) , denote the corresponding order statistic. The probability density function and the cumulative distribution function of the k th order statistic, say Y = X (k) , given by where f (y) and F(y) are the PDF and CDF of OEPIV distribution given by Equations (4) and (3), respectively. Using the binomial expansion of [1 − F(y)] n−k , given as follows Substituting Equation (13) into (12), we have Substituting Equations (3) and (4) into (14), we obtain Using binomial expansion of 1 − exp −λ 1 + y

Rényi Entropy
The Rényi entropy of a random variable X represents a measure of variation of the uncertainty. It is given by Using the PDF in Equation (4), we can write Using binomial expansion of u Rλ , given as follows Thus, we put the above formula in the integration to have The Rényi entropy of the OEPIV distribution is

Estimation of the OEPIV Parameters
We assume that x 1 , x 2 , . . . , x n is a random sample from the OEPIV distribution. Then, the log-likelihood ( ) for φ = (λ, a, θ, α) is where h i = 1 + ( x i θ ) 1/a . The likelihood equations are given by and We can obtain maximum likelihood (ML) estimates of the parameters by directly maximizing Equation (17) using the nlm or optim functions in R package or by solving Equations (18)- (21). Under standard regularity conditions, we can obtain approximate intervals estimation of the parameters using multivariate normal distribution N 4 (0, J(φ) −1 ) by numerically evaluating the elements of the 4 × 4 observed information matrix J(φ) atφ, J(φ) = − ∂ 2 ∂φ j ∂φ k . In addition, the likelihood ratio (LR) test can be applied to discriminate between nested models.

Simulation Studies
We conducted a Monte Carlo simulation to illustrate the performance of the ML parameter estimates of the OEPIV distribution. That is, we randomly generated 10,000 samples with size 30, 50, 100, 200, and 500 from the OEPIV distribution for two different sets of parameter values as follows: The estimates for the parameters were obtained along with their calculated bias and mean square error (MSE), given byB The results of the simulation are displayed in Table 2. We concluded from these results that the empirical means tend to the true value of the parameters as the sample size increases.
In addition, the MSEs and biases decreased as we increased the sample size.

The Log Odds Exponential-Pareto IV Regression Model
If X is a random variable from the OEPIV distribution, as given in Equation (4), then Y = log(X) is a random variable that has a LOEPIV distribution with the transformation parameter σ = a and µ = log(θ). Therefore, the PDF and CDF of the LOEPIV distribution are as follows: where σ > 0 is the scale parameter, λ > 0, α > 0 are the shape parameters, and −∞ < µ < ∞ is the location parameter. The LOEPIV model becomes the log exponential-Pareto (LEP) distribution The SF and HF are given by The following are the properties for the LOEPIV distribution: The quantile of the LOEPIV distribution The mode of the LOEPIV distribution Then, the mode can be obtained by solving Equation (27) numerically. The median of the LOEPIV distribution The mgf of LOEPIV distribution , will reduce the above integration to Then, using the binomial expansion Using the gamma function. Thus, the mgf of LOEPIV distribution is as follows The standardized random variable for y in Equation (22) is defined as z = (y − µ)/σ, then z has the following PDF with SF given as Hence, a linear location-scale regression model with response variable y i and explanatory vector x i = (x i1 , ..., x ip ) T can be defined as where z i is the random error with PDF in Equation (24), β = (β 1 , ..., β p ) T , and σ > 0, λ > 0, and α > 0 are the unknown parameters. y i is the location of µ i = β T x i and the location vector µ = (µ 1 , ..., µ n ) T can be represented as a linear model µ = β T x, in which (x 1 , ..., x n ) T is the known model matrix. Therefore, the SF of Y i |x is expressed as:

ML Method
For the right-censored lifetime data, we have t i = min( f i , c i ), where f i is the lifetime and c i is the censoring time, then, we have y i = log(t i ) for the ith individual i = 1, . . . , n. If we have a random sample with n observations (y 1 , τ 1 , x 1 ),...,(y n , τ n , and assuming the censoring and lifetimes are independent and random. Then, the likelihood function for the regression model in (31) with θ = (λ, α, σ, β) T assuming right censoring is as follows: where f (y i ) and SF(y i ) are given by Equations (17) and (19) of Y i , respectively. The for θ reduces to where ∑ n i=1 τ i = r represents the uncensored data, and z i = (y i − β T x i )/σ. The ML estimate for the parameter vector θ could be obtained using an optimization algorithm that maximizes Equation (32).

Jackknife Method
The jackknife technique was developed by Quenouille (1949) to estimate the bias of an estimator. It is an alternative method to estimate the LOEPIV parameters based on "leaving one out".
Suppose thatθ is the parameter estimation of the whole sample andθ −i is the parameter estimation when we dropped the ith observation from the data. That is, the pseudo-value of the ith observation is obtained asθ Then, the jackknife estimate of θ is the mean of pseudo-values, denotedθ * iŝ For more details, see [42][43][44].

Sensitivity Analysis: Global Influence
Global influence, introduced by [45], is used to conduct a sensitivity analysis that represents the diagnostic effect depending on the case deletion. Case deletion measures the impact of dropping the ith observation from the data set on the estimate of the parameters. That is, this method is based on comparing the difference ofθ andθ −i whereθ −i is the estimated parameters when the ith observation is dropped from data. Ifθ −i is distant fromθ, then this case is considered as influential. The case deletion model for the LOEPIV regression Model (31) is We denote the ML estimate of θ when the ith observation is dropped byθ −i = (λ (i) ,α (i) ,σ (i) ,β (i) ) T . Then, we describe two methods of global influence below.

Generalized Cook distance (GD) is the first measure of global influence and is defined as
whereM(θ) denotes the observed information matrix.

Likelihood Distance
Likelihood distance (LD) measures the differences betweenθ andθ −i , and is given by where (θ −i ) is the log likelihood function of θ when the ith observation is dropped from the data.

Residual Analysis
In the regression model, checking the assumptions and appropriateness of the fitted model is an essential step. Therefore, we used residual analysis to check the assumptions and detect outlier observations. In this study, we consider the following types.

Martingale Residual
Barlow and Prentice [46] proposed the martingale residual as where δ i denotes the censor indicator, where δ i = 0, if the ith observation is censored, and δ i = 1, if the ith observation is not censored, and SF(y i ;θ) denotes the SF for the regression model. Therefore, the martingale residual of the LOEPIV regression model is where r M i has a range between −∞ and 1 and has skewness. Thus, the transformation of r M i will be used to reduce the skewness.

Deviance Residual
This is a further improvement of the martingale residual, which reduces the skewness and make it more symmetrical, around zero. It can be expressed as where r M i is defined in Equation (36), and the deviance for the LOEPIV regression model is

Simulation Study for the Log Odds Exponential-Pareto IV Regression Model
We performed a Monte Carlo simulation to explore the empirical distribution of the r M i and r D i for different values of n and different censoring levels. The lifetimes t 1 , ..., t n were from the OEPIV distribution in Equation (4), and x i was generated from uniform(0, 1). We sampled the censoring times c 1 , ..., c n from uniform(0, ρ), where ρ was adjusted until we obtained the required censoring level. For each fit, the log lifetimes were obtained as y i = min{log(t i ), log(c i )}. We generated 1000 samples. For each selection of n, λ, α, σ, β 0 , and β 1 , and the censoring levels. The simulation was conducted for n = 30, 50, and 100 with λ = 0.3 ,α = 0.36, σ = 0.6, β 0 = −0.6, and β 1 = 1, and the censoring levels 0.1, 0.3, and 0.5. Figures 3 and 4 present normal probability plots (NPP) for the residuals. These figures show that the r D i empirical distribution provided more agreement with the standard normal distribution (SND) compared to r M i . r D i also approached the SND as we increased the sample size or decreased the censoring level.

Applications
We analyzed three real data sets to investigate the flexibility of the OEPIV distribution and the LOEPIV regression model.

The Strength of Glass Fibers Data
This data was analyzed by [47], and it represents the strength of glass fibers with the length 1.5 cm. This data consists of 63 observations. We will compare the fits of the OEPIV with the Pareto IV, Weibull BurrXII (WBXII) in [48], Weibull Frechet (WFr) in [49], Weibull Lomax (WL) in [50], Odd exponential-weibull (OE-W), Odd exponential-normal (OE-N) in [51], and Gamma distributions.
We considered the following criteria to compare these distributions: the values of the negative log-likelihood function (−ˆ ), Akaike information criterion (AIC), and corrected Akaike Information Criterion (CAIC). The smaller the values for these statistics, the better the fit to the data.
The ML estimates, standard errors (SE), −ˆ , AIC and CAIC statistics for the OEPIV, WBXII, WL, WFr, Pareto IV,OE-W, OE-N, and Gamma distributions are reported in Table 3. From the results in Table 3, it is clear that the OEPIV distribution provides better fit for the data having lowest AIC and CAIC values and could be selected as a more appropriate model than other models. Figure 5 displays the QQ-plot of the OEPIV distribution and the estimated PDFs of the fitted distributions. It is clear from these plots that the OEPIV captures the skewness of the glass fibers data than other competitive fitted distributions.

Sum of Skin Folds Data
The authors of [52] discussed this data set, and it represents 102 male and 100 female athletes collected at the Australian Institute of Sports, provided by Richard Telford and Ross Cunningham.
We compare the ML estimates and their corresponding SE, and the values of the (−ˆ ), and the AIC and CAIC statistic for fitted OEPIV distribution with the results of the Kumaraswamy Pareto-IV (KwPIV) in [53], gamma-Pareto IV (GPIV) [10], Pareto IV (PIV) in [53], and exponentiated Pareto (EP) distributions provided in [54], and the Weibull distribution. These results are reported in Table 4. From the results in Table 4, it is clear that the OEPIV distribution provides the lowest AIC and CAIC values among those of the fitted distributions. Therefore, OEPIV could be selected as the best modal for this data. Figure 6 displays the QQ-plot of the OEPIV distribution and the estimated PDFs of the fitted distributions. It is clear from these plots that the OEPIV provides a good fit to this data.

Stanford Heart Transplant Data
This data was obtained from Kalbfleisch and Prentice [55] and has information on n = 103 patients. The patient's survival time was specified as the number of days from the acceptance into a heart transplant program to death. The following are associated with each patient: y i : log survival time (days); status i : censoring indicator (1 = dead, 0 = censoring); x i1 : is the age (in years); x i2 : is the prior surgery coded as (0 = No, 1 = Yes); and x i3 : is the transplant coded as (0 = No, 1 = Yes). This data set was used by [38], [35], and [36] for illustrating the log-odd log-logistic Weibull (LOLLW), log-Fréchet (LF), and log-exponentiated Fréchet (LEF) regression models. The LOEPIV regression model will be compared with the log-Weibull (LW), LEP, LOLLW, LF, and LEF regression models.
That is, we present the results from fitting the following model where y i follows the LOEPIV distribution in Equation (22).
To examine the suitability of the proposed model, a plot of the empirical SF estimates from the Kaplan-Meier (KM) model and the SF from the fitted OEPIV model are displayed in Figure 7. Therefore, we concluded that the logarithm of times to event follow the LOEPIV distribution.  Table 5. The results demonstrated that the LOEPIV regression model had the lowest AIC, CAIC, and BIC. This shows the superiority of the LOEPIV model over other models. The LR test can be used to discriminate between LOEPIV and LEP regression models since they are nested.That is, the LR statistic for testing the hypotheses H 0 : α = 1 versus H 1 : H 0 is not true given in Table 6 and rejects the LEP model in favor of the LOEPIV model.    Table 7 lists the jackknife parameter estimates of the LOEPIV model, their corresponding SE and 95% confidence intervals. Based on the results in Tables 5 and 7, we observed that the explanatory variables x 1 , x 2 , and x 3 are significant for the fitted model and both methods displayed similar estimates. The plots of the SF that corresponded to the explanatory variables for the fitted LOEPIV regression model are presented in Figure 8. From Figure 8a, we observed thatŜ(1|age = 8) = 0.96808, which means that ≈ 97% of the patients who are 8 years old will be thriving when y = 1 (≈3 days). However, for patients between 44 and 64 years old,Ŝ(1|age = 44) = 0.34676 and S(1|age = 64) = 0.00064, which indicated that the percentages of living patients at y = 1 decreased to 34% and 0.06%, respectively. These results indicate decreases in survival of the patients as their age increased. Similarly, Figure 8b,c indicated that approximately 58% of patients who did not have surgery or receive a transplant were thriving at y = 3 (≈21 days). Furthermore, for the patients who undertook surgery, we observed that approximately 98% of them were thriving at y = 3, while patients that received a transplant,Ŝ(3|transplant = 1) = 0.9943, increased to 99% at y = 3 in the survival percentage. Therefore, it can be stated that receiving a heart transplant increased the survival time when undergoing surgery.

Global Influence Analysis
The case deletion measures GD i (θ) and LD i (θ) were numerically computed and Figure 9 represents the influence measure index plots. It is clear that case 99 could be an influential observation in the LOEPIV regression model.  Figure 9. The index plot of (a) GD i (θ) and (b) LD i (θ) for the LOEPIV regression model.

Residual Analysis
In order to detect possible outlaying observations, a plot for the r D i versus the observations index is shown in Figure 10a. This demonstrated that almost all of the observations fall within (−3, 3), except for observation 8. Therefore, observation 8 was a possible outlier. Figure 10b shows the NPP for the deviance residuals with a generated envelope. Approximately all of the observations fell inside the envelope, which indicated that the proposed model was appropriate to fit the heart transplant data.

Concluding Remarks
In this article, we introduced the odd exponential-Pareto IV distribution. We derived some of its statistical and mathematical properties. The model parameters were estimated using the ML method, and simulation studies were carried out to examine the performance of the ML estimators based on biases and mean squared errors. Moreover, a new log-location regression model for censored data based on the OEPIV distribution was introduced. The ML and jackknife estimation methods for right censored data were used to estimate the unknown parameters of the new regression model. The model assumptions were checked using martingale and deviance residuals. Furthermore, generalized Cook and likelihood distance measures were defined to detect the influence observations for the regression model. Finally, we analyzed three real data sets to examine the usefulness of the OEPIV distribution and LOEPIV regression model. The results demonstrated that the OEPIV distribution outperformed other competitive distributions in terms of goodness of fit. In addition, the LOEPIV regression model provides a good fit for the Stanford heart transplant data.

Conflicts of Interest:
The authors declare no conflict of interest.