On Odd Perks-G Class of Distributions: Properties, Regression Model, Discretization, Bayesian and Non-Bayesian Estimation, and Applications

: In this paper, we present a new univariate ﬂexible generator of distributions, namely, the odd Perks-G class. Some special models in this class are introduced. The quantile function (QFUN), ordinary and incomplete moments (MOMs), generating function (GFUN), moments of residual and reversed residual lifetimes (RLT), and four different types of entropy are all structural aspects of the proposed family that hold for any baseline model. Maximum likelihood (ML) and maximum product spacing (MPS) estimates of the model parameters are given. Bayesian estimates of the model parameters are obtained. We also present a novel log-location-scale regression model based on the odd Perks–Weibull distribution. Due to the signiﬁcance of the odd Perks-G family and the survival discretization method, both are used to introduce the discrete odd Perks-G family, a novel discrete distribution class. Real-world data sets are used to emphasize the importance and applicability of the proposed models.


Introduction
Over the past two decades, a number of generalized classes of statistical models have been developed and explored for the modeling of data in a variety of applications, including in the medical sciences, engineering, environmental and biological studies, life-testing challenges, demographics, actuarial science, and economics. As a result, a number of researchers have presented novel distribution classes that broaden well-known statistical models while also providing a high degree of adaptability for the analysis of data. As a result, various classes have been proposed in the statistical literature for generating new distributions by adding one or more factors. A few famous examples are as follows: the exponentiated Weibull family presented by Mudholkar et al. [1]; the novel approach offered by Marshall Olkin [2], involving the embedding of a parameter into a class of statistical models; the exponentiated T-X family of distributions reported by Alzaghal et al. [3]; Type II half Logistic-G by Hassan et al. [4]; the Weibull-G family by Bourguignon et al. [5]; the beta-generated familyby Eugene et al. [6]; the gamma-generated family by Zografos et al. [7]; the additive Weibull-G family by Hassan et al. [8]; the odd Lindley-G family by Silva et al. [9]; odd inverse power generalized Weibull-G by Al-Moisheer et al. [10]; Marshall-Olkin odd Burr III-G by Afify et al. [11]; Topp-Leone odd Fréchet-G by Al-Marzouki et al. [12]; the transmuted odd Fréchet-G family of distributions by Badr et al. [13]; odd generalized N-H-G by Ahmad et al. [14]; generalized odd Burr III-G by Haq et al. [15] and the odd Fréchet-G family by [16], among others. The Weibull-power Cauchy distribution, presented by Tahir et al. [17], is also worthy of mention.Cordeiro et al. [18] created a new family of generalized distributions. Different authors introduced new distribution to fit COVID-19 data as [19][20][21].
Perks [22] has presented a four-parameter extension of the Gompertz-Makeham distribution with the following hazard rate function: h(x) = R + S e µx 1 + M e −µx + Ne µx .
When M = N = 0, the Gompertz-Makeham hazard rate function is obtained. The parameters appear to have been designed by Perks to be non-negative, and Marshall and Olkin [23] have demonstrated that N = 0 cannot be used. However, by setting M = 0 and choosing N −→ 0 as the limit, the Gompertz-Makeham distribution can be obtained. Richards [24] has recently introduced a modified version of the Perks distribution, which takes the hazard function of the Perks distribution into account: The Perks distribution has several applications in the field of actuarial science. Haberman et al. [25] and Richards [24] have shown that this distribution is a good fit for pensioner mortality data. The parametric mortality projection is well-described by the Perks distribution, according to Haberman et al. [25]. The cumulative distribution function (cdf) and probability density function (pdf) of the Perks distribution are given as follows: and π(t; θ, β) = βθe θx 1 + β 1 + βe θx 2 . (2) The authors in [26] defined a new idea for the generation of larger families, making use of any pdf as a generator. The above generator is a member of the T − X distribution family, and its cdf is specified by where c(t) is the pdf of a random variable (RV) T ∈ [a, b] for −∞ < a < b < ∞, G(x, δ) is the cdf of a random variable X, and Φ[G(x, δ)] is a function of G (x, δ), which satisfies the following conditions: ] is differentiable and monotonically non-decreasing; The relevant pdf can be obtained as follows: Inspired by the T-X concept, we develop a new, broader, and more flexible class of distributions, called the odd Perks-G (OP) class, by combining Φ[G(x, δ)] = G(x;δ) 1−G(x;δ) and replacing c (t) by βθe θt 1+β (1+βe θt ) 2 , where t > 0, θ > 0; λ ≥ 0; G(x; δ) is the baseline cdf, which depends on a parameter vector δ; and G(x; δ) = 1 − G(x; δ) is the baseline reliability function. For each baseline G(x; δ), the OP cdf is provided as follows: F(x; β, θ, δ) = βθ(1 + β) G(x;δ) G(x;δ) 0 e θt 1 + βe θt 2 dt ) .
The associated pdf may be obtained as follows: G(x; δ) 2 where g(x; δ) is the baseline pdf of a baseline model. X OP − G (β, θ, δ) is used to represent an RV X with density function (6). The survival function (SF) of the OP-G family is: and the hazard rate function (HRF) is defined as G(x; δ) 2 ) .
The OP-G family can be explained in the following way. Assume Y is a stochastic system's RV lifetime with a specified continuous G model, where G(x;δ) G(x;δ) is the odds ratio that an individual (or item) may not be active (failure or death) at time x following a lifetime Y. If the diversity of this chance of failure is denoted by the RV X and, as such, by the extended exponential model with parameters β and θ, then the cdf of X is as follows: The primary motives for employing the OP-G family in practice are: (i) To realize special models for all sorts of HRFs; (ii) Under the same baseline distribution, to regularly provide better fits than alternative produced models; (iii) Compared to the baseline model, to increase the adjustability of the kurtosis; (iv) To construct symmetric, left-and right-skewed, and inverted J-shaped distributions.
The association between survival time and numerous factors, such as sex, weight, blood pressure, and many more, has recently sparked significant attention in the relevant literature. Different parametric regression models, including the log-location-scale regression model, have been employed in a number of applications to quantify the effects of co-variate variables on survival time. As it has been extensively utilized in clinical trials and many other domains of application, the log-location-scale regression model stands out. In many real-world applications involving lifetime data, determining the link between survival time and independent (explanatory) variables is critical. In this context, the regression model method can be applied. The linear log-location-scale regression odd Perks-X model can be stated as follows: where z i ; i = 1, . . . , n is the random error, with density function f ( y−B T X σ ); B = (B 1 , B 2 , . . . , B k ) is a vector of unknown parameters of the explanatory variables; σ > 0 is the scale parameter of the regression model; and X = (X i1 , X i2 , . . . , X ik ) is the explanatory variable vector, where k is the number of explanatory variables. For more information about linear locationscale regression models, see, for example, [27][28][29][30][31][32].
The remainder of this paper is divided into several sections, structured as follows. In Section 2, a useful expansion of OP-G is derived and some special models are obtained by means of the OP-G generator. Several mathematical statistical properties, MOMs, probability-weighted moments (PRWMOMs), residual life (RL) and reversed residual life (RRL) FUNs, and entropy (EN) are investigated in Section 3. In Section 4, non-Bayesian estimates of the model's parameters are obtained. In Section 5, Bayesian estimates of the model's parameters are obtained. In Section 6, bootstrap confidence intervals for the model's parameters are obtained. In Section 7, the log-odd Perks-Weibull regression model is introduced. Simulation studies are described in Section 8. Section 9 details the discretization of the OP-G family. In Section 10, real-world data sets are used to demonstrate the adaptability of the proposed family. Finally, we present our conclusions.

Density of the OP-G Class: Useful Expansions
We propose a handy linear representation of the OP-G density function in this section. We can write, using the generalised binomial expansion, ) .
For the exponential function, we can use the power series When we substitute this expansion into Equation (11), we obtain the following: If | h |< 1 and f > 0 yield a true non-integer, then the following power series occurs: Applying (12) in (11), for the term G(x; δ) i+2 , the OP-G density function can be expressed as an infinite mixture of expo-G density functions, as where h ζ (x) = ζg(x)G ζ−1 (x) is the expo-G pdf with power parameter ζ and The cdf of the OP-G class can also be expressed as a mixture of expo-G cdfs where H (j+k+1) (x) is the cdf of the exp-G function with power parameter (j + k + 1). Thus, several mathematical and statistical properties of the OP-G family can be determined obviously from those of the expo-G family.

Special Models
In this section, we examine four different OP-G special models.

Odd Perks Exponential
The exponential cdf and pdf with parameter λ are G(x; λ) = 1 − e −λx and g(x; λ) = λe −λx . The cdf and pdf of the odd Perks exponential (OPE) are respectively given by

Odd Perks-Weibull
Let us consider the Weibull distribution with cdf and pdf values given by G(x; δ, λ) = The odd Perks-Weibull (OPW) has cdf and pdf given, respectively, by and the associated pdf is given by

Statistical Features
The statistical features of the OP-G family are investigated in this section; specifically, the QFUN, MOMs, incomplete MOMs, PRWMOMs, and RL and RRL FUNs.

Quantiles
The OP-G quantiles (e.g., x = Q(u)), may be derived by inverting (5), as shown below where Q G(u) denotes the QFUN.

Moments
In this sub-section, the ordinary MOM and MOM GFUNs of the OP-G class are derived. Most of the necessary characteristics and features of a distribution can be studied through its MOMs.
where M (j+k+1)) (t) is the MOM-GFUN of Z (j+k+1) . As a result, the exp-G GFUN may readily be used to determine M X (t). The second formula for M X (t) can be derived from (13) as where M κ (t) is the mgf of the RV Z κ , given by For each real s > 0, the sth incomplete MOMs of X, defined by χ s (t), can be written as where which can be evaluated numerically.
The (s, r)th PRWMOMs of the OP-G class are provided by: based on (5) and (6). Then, after some calculation, we obtain As a result, (s, r)th PRWMOMs of the OP-G class can be expressed as Thus, the (s, r)th PRWMOMs of X may be generated by combining an unlimited number of exp-G MOMs, provided as

Residual Lifetimes
The rth-order MOM of the RL is given as: Such a procedure may be used to calculate the rthorder MOM of the RRL.

Four Different Types of Entropy
The Rényi EN (REN) (see [33]) is characterized by (ρ > 0, ρ = 1) Again using the binomial expansion (13) in (6), we obtain: As a result, the REN of the OP-G class is given by The Tsallis EN (TEN) measure (see [34]) is defined as The Havrda and Charvat EN (HCEN) measure (see [35]) is defined as The Arimoto EN (AEN) measure (see [36]) of OP-G is defined as Numerical values of the REN, TEN, HCEN, and AEN under various parameter values in the OPE model are provided in Table 2.

Non-Bayesian Estimation
In this section, we examine two different non-Bayesian estimation approaches for the OP-G family parameters: The maximum likelihood and maximum product of spacings methods.

Likelihood Method
Various parameter estimation strategies have been introduced in the literature, the most prominent of which is the maximum likelihood (ML) method, which may be used to create confidence ranges for model parameters, as well as in the testing of statistics. Using complete samples, we can calculate the ML estimates (MLEs) of the parameters for the proposed class. Let x 1 , ..., x n be a random sample of size n from the OP-G class with parameters β, θ, and δ. The log-likelihood (LL) FUN is given as (23) where given by and where , and δ k is the kth element of the vector of parameters δ.

Maximum Product of Spacings (MPS) Estimation
The authors in [37] developed the MPS methodology as an alternative to the MLE method for estimating the parameters of continuous univariate distributions. They argued that, by replacing the likelihood function with a product of spacings, the MPS approach possesses most of the properties of ML. The authors in [38] also considered the MPS technique as an independent approximation of the Kullback-Leibler information measure.
Let (X 1 < X 2 < · · · < X n ) be from the OP-G family with cdf (5) and parameters β, θ, and δ. Then, the uniform spacings of this random sample are defined as where The MPSEs can be obtained by maximizing the product of spacings, as follows The MPSEs of β, θ, and δ are calculated by first solving the non-linear equations. The logarithm of the product of spacings in Equation (28) is then differentiated with respect to each parameter. Non-linear optimization algorithms (e.g., the Newton-Raphson method) can be used to numerically solve these equations, as they are difficult to solve analytically. An asymptotic variance-covariance matrix and normal approximation confidence intervals are computed after the ACI.

Bayesian Estimation
In this section, we consider the Bayesian estimation of the parameters of the model obtained when data are observed based on the squared error loss function (SELF), defined by whereΩ is an estimator of Ω. We denote the prior and posterior distributions of Ω by π(Ω) and π * (Ω | x), respectively. Under the SELF, the Bayesian estimate of any FUN B(Ω) of Ω is given byṽ A prior distribution is important for the development of Bayes estimators. Under the assumption of gamma prior distributions, we investigate this estimation problem. Therefore, it is assumed that β, θ, and δ follow independent gamma distributions, with β 1 ∼ Γ(η 1 , ζ 1 ), θ ∼ Γ(η 2 , ζ 2 ), and δ ∼ Γ(η 3 , ζ 3 ) if δ > 0 and if δ is an individual parameter, with respective pdfs given by Using the informative prior (30) and the likelihood FUN (6), the joint posterior density may be calculated as follows: The marginal posterior densities of the parameters β, θ, and δ can be derived as As the marginal posterior densities in (32), (33) and (34) are not well-known distributions, we utilize the Metropolis-Hastings sampler to produce values for β, θ, and δ, using the normal proposed distribution in (32), (33) and (34).
Furthermore, the approach of Chen and Shao [39] has been widely used to create highest posterior density (HPD) intervals for Bayesian estimates with uncertain benefit distribution parameters. For example, using the two endpoints from MCMC sample outputs, the 2.5% and 97.5% percentiles, a 95% HPD interval can be produced. The Bayesian credible intervals for the parameters β, θ, and δ are calculated as follows: 1.

Bootstrap CI
We propose bootstrap confidence intervals as an alternative to the asymptotic confidence interval for the parameters of the model. For this objective, we created parametric bootstrap samples and discovered two unique bootstrap confidence intervals. First, we employed the Efron [40] percentile bootstrap method (boot-p). Use of the bootstrap-t technique was then proposed, based on the concept of Hall [41] (boot-t). For further information on how these bootstrap confidence intervals work, see [42][43][44].
Step 2: Calculate the MLEs of all parameters according to the bootstrap samples, denoted byβ,θ, andδ.
Step 3: Repeat Step 2 B times, as needed, in order to obtain a set of bootstrap estimates forβ,θ, andδ.
(ii) Boot-t method Step 1: The approach is the same as that in the boot-p approach.
Step 2: Compute the bootstrap estimate of R F by replacing the parameters in Equation (24) with their bootstrap estimates, denoting them by R * F and the following statistics Step 3: Step 2 should be repeated B times, as needed.

The Log-Odd Perks-Weibull Regression Model
If X is an RV with an odd Perks-Weibull (OPW) distribution, Y = log(x) is an RV with a log-OPW (LOPW) distribution with the transformation parameters δ = 1 σ and µ = log(λ). As a result, the pdf and cdf of the LOPW distribution are as follows: and where −∞ < µ < ∞ is the location parameter, β, θ > 0 are the shape parameters, and σ > 0 is the scale parameter. The SF and HRF are provided by If z = y−µ σ is the standardized RV for y in Equation (36), then z has the following pdf: with SF denoted as Using the linear location-scale regression model in Equation (1), where µ = B T X, the SF of y i |X can thus be written as:

MLE Method for Parameters of the Regression Model
The likelihood FUN of the regression model can be expressed as: where z i = y i −B T X i σ . By maximizing the log-likelihood function (42), the MLEsβ,θ,σ, andB T of β, θ, σ, and B T can be obtained. The survival function for y i can be computed using the fitted model (1): The survival function for t = e y is derived, using the invariance characteristics of the MLE, as follows: , whereδ = 1 σ andλ = e (B T X) The asymptotic distribution of √ n(Θ − Θ) is multivariate normal N(0, I M −1 (Θ)), where I M −1 (Θ) is the information matrix, when the requirements are met for the parameter vector Θ = (β, θ, σ, B T ) in the interior of the parameter space but not at the boundary. The approximated multivariate normal distribution can be used to build approximate confidence areas for particular parameters in Θ in the traditional manner.

Simulation for OPE Distribution
To demonstrate the performance of the MLE, MPS, and Bayesian estimation methods with respect to the OPE distribution parameters, we ran a Monte Carlo simulation; that is, for two separate sets of parameter values, we randomly produced 10,000 samples of sizes 30, 70, and 150 from the OPE distribution: In Table 3, β = 0.4, θ = 0.5, λ = 0.6, β = 0.4, θ = 0.5, λ = 3, β = 0.4, θ = 2, λ = 0.6, and β = 0.4, θ = 2, λ = 3; In Table 4, β = 2, θ = 0.5, λ = 0.6, β = 2, θ = 0.5, λ = 3, β = 2, θ = 2, λ = 0.6, and β = 2, θ = 2, λ = 3 The parameter estimates were obtained by computing the bias and mean square error (MSE), as well as the length of the confidence interval (L.CI) for MLE and MPS by asymptotic CI, in addition to the bootstrap CI approach for MLE and the credible CI determined using the HPD interval for Bayesian estimation.The simulation outcomes are shown in Tables 3 and 4. As a result of these findings, we concluded that as the sample size increased, the empirical means tended to approach the true value of the parameters. Furthermore, as the sample size grew larger, the MSEs and biases decreased.

Simulation of the LOPW Regression Model
Next, we conducted a Monte Carlo simulation to examine the performance of the ML parameter estimates of the LOPW regression model. The lifetimes were obtained from the OPW distribution, and independent variables x i1 , x i2 were generated using the uniform distribution in the range (0, 1). A total of 1000 samples were created, using the parameters detailed below.
The simulation was conducted using n = 30, 70, and 150.

Discretization
There are a variety of approaches in the statistical literature for converting a continuous distribution to a discrete one. The survival discretization method is the most often-used methodology for generating discrete distributions; for further information, see Roy [45]. It requires the existence of a cdf, a continuous and non-negative survival function, and times separated into unit intervals. The discrete distribution PMF is defined as follows: Then, the PMF of the discrete OP-G family can be expressed as The cdf of the discrete OP-G family is given as follows and the HRF of the discrete OP-G family is Regarding the OPE distribution, the PMF of the discrete OPE (DOPE) distribution is  The HRF of the DOPE distribution is given as

Applications
We utilized three real data sets to test the superiority of the continuous distribution, COVID-19 data from Saudi Arabia to test the superiority of the discrete distribution, and Stanford heart transplant data to test the superiority of the regression model. We

Radiation Failure Mice
We first examined the genuine data set of radiation failure mice (RFM) reported by Hoel [9], obtained from a laboratory experiment in male mice aged 5-6 weeks that had been exposed to a 300 roentgen radiation dosage. Our goal was to look at other causes of death that were not related to the two main causes of death: Reticulum cell sarcoma and thymic lymphoma. The data were 40, 42, 51, 62, 163, 179, 206, 222, 228, 252, 249, 282, 324, 333, 341,  366, 385, 407, 420, 431, 441, 461, 462, 482, 517, 517, 524, 564, 567, 586, 619, 620, 621, 622, 647, 651, 686, 761, and 763. Table 7 presents the MLE with SE and different measures for the RFM data. Table 7 presents the comparison between our model and different distributions: The Marshall-Olkin alpha power exponential (MOAPEx) introduced by [46], the Marshall-Olkin alpha power Weibull (MOAPW) introduced by [47], the Weibull-Lomax (WL) model introduced by [48], the Kumaraswamy Weibull (KWW) introduced by [49], the alpha power inverse Weibull (APIW) introduced by [50], and the generalized inverse Weibull (GIW) introduced by [51]. Based on these results, we present the measured values of AKINC, BINC, KOS, CVOM, and AND. The smallest values were observed for the OPE distribution, while the largest values were seen with the PV. Based on the results presented in Table 7, we note that OPE can be considered as the best model to fit the RFM data. Figure 6 confirms the results shown in Table 7. Figure 7 shows the PP-plot and QQ-plot for the OPE distribution on the RFM data set.

Failure Times of a Certain Product
The second data set, that of Gacula and Kubala [52], contains 26 observations and indicates failure times for a specific product. This information has also been utilized by Nassar at al. [46]. The data are 24, 24, 26, 26, 32, 32, 33, 33, 33, 35, 41, 42, 43, 47, 48, 48, 48, 50, 52, 54, 55, 57, 57, 57, 57, and 61. Table 8 presents the MLE with SE and different measures for the failure time data. Table 8 presents a comparison between our model and different distributions, including the MOAPEx, MOAPW, WL, KWW, APIW, and GIW distributions. Based on these results, we found that the measured values of AKINC, BINC, KOS, CVOM, and AND were the smallest for the OPE distribution, while the largest values were obtained with PV. Based on the results in Table 8, we note that the OPE represented the best model to fit the failure time data. Figure 8 confirms the results shown in Table 8. Figure 9 shows the PP-plot and QQ-plot for the OPE distribution on the failure time data set.

Mechanical Data
The third data set comprises the times measured between failures for repairable mechanical equipment items, obtained from the work of Seber et al. [53]. The data are 0. 11 [54]. The results shown in Table 9 indicate that the smallest values of the AKINC, BINC, KOS, CAKINC, and HQINC were obtained by OPE, while the largest values were obtained by the PV for KOS, when comparing our results with those discussed by Elshahhat et al. [54]. Thus, the OPE distribution performed better than the inverse-Weibull, APIW, inverse gamma, generalized inverse Weibull (GIW), exponentiated inverted-Weibull, generalized inverted half-logistic, inverted Kumaraswamy, inverted Nadarajah-Haghighi, alpha-power inverse-Weibull, and extended inverse Gompertz (EIGo), which were discussed in [54]. Figure 10 shows the estimated cdf, estimated pdf, PP-plot, and QQ-plot for the OPE distribution, which confirm the good fit of the OPE distribution to these data.

Stanford Heart Transplant Data
Data for n = 103 patients were acquired from the work of Kalbfleisch and Prentice [55]. The number of days between admittance to a heart transplant program and death was used to calculate the patient survival times. Each patient was linked to the following data: y i , log survival follow-up time (days); x i1 , age (in years); and x i2 , prior surgery (coded as 0 = No, 1 = Yes). We present the fitting results for the following model: where y i follows the LOPW distribution. Table 10 shows MLE, SE, and Z-values, as well as PVs, for the LOPW regression model, while Table 11 provides different measures obtained for the LOPW regression model. Figure 11 shows the correlation values.    Figure 12.

Concluding Remarks
In this study, we explored the novel odd Perks-G family, and several of its statistical and mathematical features were established. We obtained some of its special models, including the OPU, OPE, OPW, and OPL distributions. The associated model parameters were estimated using the ML technique, the MPS method, and the Bayesian estimation approach, and simulation tests were conducted to evaluate the effectiveness of the OPE estimators using various estimation methods based on biases, MSE, and the CI length. In addition, the OPW distribution was used to develop a new log-location regression model. The unknown parameters of the new regression model were estimated using ML estimation methods. Furthermore, we introduced the discrete odd Perks-G family using the survival discretization method and obtained the DOPE distribution as a special model. Finally, we examined the utility of OP-G family distributions using three real data sets, analyzed Stanford heart transplant data using the LOPW regression model, and analyzed COVID-19 data using the discrete model. The OPE distribution outperformed other state-of-the-art distributions in terms of goodness of fit, according to our findings. Furthermore, the LOPW regression model fit the Stanford heart transplant data well. Additionally, the DOPE distribution provided a good fit to the COVID-19 data. In our future research, the new suggested family will be used to generate more new distributions, the statistical properties of which will be explored. We also intend to study the statistical inferences of new models generated using the odd Perks-G family.  Data Availability Statement: Data sets are available in the application section.

Conflicts of Interest:
The authors declare no conflict of interest.