A New Extended Geometric Distribution: Properties, Regression Model, and Actuarial Applications

: In this paper, a new modiﬁed version of geometric distribution is proposed. The newly introduced model is called transmuted record type geometric (TRTG) distribution. TRTG distribution is a good alternative to the negative binomial, Poisson and geometric distributions in modeling real data encountered in several applied ﬁelds. The main statistical properties of the new distribution were obtained. We determined the measures of value at risk and tail value at risk for the TRTG distribution. These measures are important quantities in actuarial sciences for portfolio optimization under uncertainty. The TRTG parameters were estimated via maximum likelihood, moments, proportions, and Bayesian estimation methods, and the simulation results were determined to explore their performance. Furthermore, a new count regression model based on the TRTG distribution was proposed. Four real data applications were adopted to illustrate the applicability of the TRTG distribution and its count regression model. These applications showed empirically that the TRTG distribution outperforms some important discrete models such as the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions. alternative to the negative binomial, geometric, and Poisson distributions. The TRTG distribution can be used to model insurance data as compared to the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions. The results show that the TRTG distribution outperforms some important discrete models. Author Contributions: Conceptualization, T.E., Y.A. and M.M.A.A.; methodology, A.Z.A.; soft-ware, Y.A.; validation, T.E., Y.A., M.M.A.S. and A.Z.A.; formal analysis, Y.A. and M.M.A.S.; writ-ing—original draft preparation, Y.A., M.M.A.S. and A.Z.A.; writing—review and editing, T.E., Y.A. and A.Z.A.; visualization, M.M.A.A.; supervision, A.Z.A.; funding acquisition, M.M.A.A. All authors version


Introduction
Discrete models are very important in handling count data encountered in several theoretical and applied sciences such as medicine, insurance, life testing, biology, and agriculture. Recently, there has been an increased interest among statisticians to construct new flexible discrete distributions. Chakraborty and Chakravarty [1] mentioned that almost all observed values are actually discrete because they are measured to only a finite number of decimal places and cannot really constitute all points in a continuum.
On the other hand, in some life testing and survival analysis studies, lifetimes can be treated as a discrete random variable and hence its reliability function is a function of a discrete random variable. For example, the reliability of a switching device is a function of the number of times the switch is operated or the reliability of a computer is a function of the number of time the computer has broken down. Recently, many continuous lifetime distributions have been discretized for modeling discrete lifetime data. For example, discrete Weibull by [2], discrete Burr and discrete Pareto distributions by [3].
Furthermore, some discrete distributions have been proposed by compounding two discrete distributions-for example, the uniform-Poisson distribution by [4], uniformgeometric distribution by [5], and binomial discrete-Lindley distribution by [6]. Recently, Al-Babtain et al. [7] proposed a natural discrete analog of the continuous-Lindley distribution as a mixture of negative binomial and geometric distributions.
In recent decades, several works have been introduced in the statistical literature to discretize continuous distributions. However, there is still a clear need to introduce a more flexible discrete lifetime distributions to model several types of count data in many applied areas including insurance, social sciences, reliability studies, and economics. It is worth noting that the probability mass functions (pmfs) of most recently introduced discrete distributions are developed by discretizing the continuous survival functions of continuous distributions and have quite a complex structure in terms of their parameter estimation-for example, the two discrete Lindley models by [8,9].
Apart from discretization techniques, we were motivated to propose a more flexible extension of the geometric distribution using a transmuted record type (TRT) method due to [10]. The TRT approach can be summarized as follows. Let X 1 , X 2 , . . . , X n be a random sample from a distribution having cdf G(·). Let X U (1) and X U(2) be the first and second upper records, respectively, based on this sample. Consider a random variable X that is defined as follows: X d = X U(1) with probability (1 − θ), Hence, the cdf of X follows as where θ ∈ (0, 1). According to Equation (3.1) of [11], the cdf of the first upper record, X U(1) , reduces to: The cdf of the second upper record, X U(2) , takes the form: By inserting Equations (2) and (3) in (1), we obtain: After some algebra, the cdf of X reduces to: where G(x) is any baseline cdf and Equation (4) is referred as the cdf of the TRT method. Tanış and Saraçoglu [12] constructed the TRT-Weibull distribution using the TRT approach. Further information about the TRT approach can be explored in [10].
To the best of our knowledge, this is the first article that applies the TRT method to construct an extended form of geometric distribution. The proposed discrete model is called transmuted record type geometric (TRTG) distribution and it is suitable for overdispersed data and hence can be applied in collective risk models and can be considered a competitive distribution to the negative binomial and Poisson distributions for fitting automobile claim frequency data.
Additionally, we derive explicit expressions for its basic distributional properties including moment generating and probability generating functions, mean, variance, skewness, kurtosis quantile function, stochastic orders, mean deviation, and mean residual life. In addition, we derived two important risk measures, namely the value at risk and tail value at risk for the TRTG model. The TRTG parameters were estimated via the maximum likelihood, moments, proportions, and Bayesian estimation methods. The simulation results were determined to explore their performance in estimating the TRTG parameters θ and q. The applicability of the TRTG distribution was studied by three data sets from the actuarial sciences showing its superiority as compared with competing models, namely transmuted geometric [13], discrete Burr [3], discrete Chen [14], negative binomial, geometric, and Poisson distributions. We were also motivated to propose a count regression model based on the TRTG distribution. The new TRTG regression model outperformed the Poisson, geometric, and Poisson-Lindey (PL) [15] regression models.
The rest of the paper is organized as follows. We define the TRTG distribution in Section 2. Some of its distributional properties are provided in Section 3. In Section 4, we derive two important risk measures of the TRTG model and provide some numerical computations for them. The TRTG parameters, θ and q, are estimated via four estimation methods in Section 5. In Section 6, a Monte Carlo simulation study is conducted to investigate the efficiency of different proposed estimates. In Section 7, we analyze three insurance data sets to illustrate the flexibility of the TRTG model. The TRTG count regression model is discussed in modeling real life count data in Section 8. Finally, the paper is concluded in Section 9.

The TRTG Distribution
First, we applied the TRT methodology to propose the two-parameter TRTG distribution as an extended version of the geometric distribution with pmf, p(x; q) = (1 − q)q x , x ∈ N, q ∈ (0, 1), and a cumulative distribution function (cdf), G(x; q) = 1 − q x+1 .
By inserting the cdf, the geometric distribution in Equation (4), we obtain the cdf of the TRTG distribution as follows: The TRTG distribution is specified by the following pmf: where (q, θ) ∈ (0, 1). If X has the pmf (6), then it is denoted by X ∼ TRTG(q, θ). The survival function of the TRTG model is specified by It is clear that lim In other words, the TRTG distribution behaves like a geometric model when θ lies around zero. Consequently, the hazard function (hf) of X reduces to: Figure 1 presents the plots of the pmf of the TRTG model for some choices of q and θ. Figure 1 shows that the probabilities can only be decreasing or increasing-decreasingshaped. Furthermore, it is observed that as θ increases, in most diagrams, the mode moves to the right, showing that the TRTG model is so versatile and that small values of θ have a substantial effect on the TRTG distribution. Figure 2 displays the hf plots of the TRTG model for some choices of q and θ and it reveals that the TRTG model has a decreasing discrete hazard rate.

Moments and Quantile Function
The moment generating function of the TRTG distribution takes the form: Note that by using M X (t), we can obtain the probability generating function of the TRTG distribution as follows: Hence, the first fourth moments of X can be derived as and: Then, the variance, skewness, and kurtosis of X are given by and: The quantile function (qf) of the TRTG distribution is derived as where W refers to the Lambert function. From Equation (9), the a th quantile, (x a ), of TRTG distribution is written by (5) of the TRTG distribution. The median of the TRTG distribution follows by simply equating a with 0.5.
The most important measure of any discrete distribution is the dispersion index (DI) which is defined as DI = Var(X)/E(X). Some statistical measures of the TRTG distribution are computed and reported in Table 1. To interpret the individual effects of the parameters θ and q, the results are calculated for fixed θ = 0.5 and q = 0.5. As seen from Table 1, the mean, variance, and DI are increasing functions of the parameter q for fixed θ = 0.5. In addition, the mean and variance are increasing functions of θ for fixed q = 0.5. The DI decreases when the parameter θ increases. Furthermore, the results show that the TRTG distribution is suitable for over-dispersed count data.

Stochastic Orders
Shaked and Shanthikumar [16] illustrated that several stochastic orders exist and have many applications. Stochastic orders are important measures to judge comparative behaviors of random variables.
The following theorem illustrates that the TRTG distribution is ordered according to the likelihood ratio (lr) order as the strongest stochastic order.
Proof. The pmf of X can be expressed as The density ratio of TRTG distribution, say W(x), is obtained in two parts, as W 1 (x) and W 2 (x). If the two ratios, W 1 (x) and W 2 (x), are decreasing functions in x, then the density ratio, W(x), is also a decreasing function of x. Then, W 1 (x) and W 2 (x) can be expressed as Firstly, the first derivative of W 1 (x) with respect to x is given by Similarly, the first derivative of W 2 (x) with respect to x has the form:
Based on the chain of stochastic orders (see, [16] and Definition 4 in [7]), we conclude that the TRTG distribution can be ordered according to the hazard rate (hr), reversed hazard rate (rh), mean residual life (mrl), and stochastic (st) orders. That is, X < hr Y, X < rh Y, X < mrl Y, and X < st Y.

Mean Deviation and Mean Residual Life
The mean deviation (MD) of the TRTG model is derived as The mrl of the TRTG model is defined by

Actuarial Measures
In this section, we determined the value at risk (VaR) and tail value at risk (TVaR) measures of the TRTG distribution.

VaR Measure of the TRTG Distribution
Let X denote a loss random variable. The VaR α of X at the 100α% level, denoted by π α , is the 100α percentile (or quantile) of the distribution of X. Hence, the VaR α of the TRTG model is defined by where α ∈ (0, 1) and F is the cdf of the TRTG distribution given in (5). The VaR α of the TRTG distribution with qf (9) is derived as θ log(q) .

TVaR Measure of the TRTG Distribution
Let X denote a loss random variable. The TVaR of X at the 100α% security level, denoted by TVaR α is the expected loss given that the loss exceeds the 100α percentile of the distribution of X. For engineering or actuarial applications, it is more common to consider the distribution of losses-in this case the right-tail TVaR is considered (typically for α = 95% or α = 99%). The TVAR α is defined by Using the pmf and cdf of the TRTG model, the TVaR α is derived as

Estimation
In this section, the estimation of the TRTG parameters is examined using some classical and Bayesian methods.

Method of Maximum Likelihood
Let x 1 , . . . , x n be the observations of n independent and identically random variables X 1 , . . . , X n from the TRTG distribution. Then, the corresponding log-likelihood function reduces to: Then, the maximum likelihood (ML) estimators of q and θ, say q andθ, are the solution of the following linear equations: ∂ n (q, θ)/∂q = 0 and ∂ n (q, θ)/∂θ = 0. The ML estimators of q and θ cannot be obtained explicitly. Therefore, they can be obtained by numerical methods. The fminsearch command in Matlab is used for this purpose.

Method of Moments
The moments (MM) estimators of the parameters q and θ were obtained by simultaneously solving the following two equations: and: Equations (10) and (11) can be numerically solved using the Newton-Raphson method. The solutions of these two equations are the MM estimators of the parameters q and θ.

Method of Proportions
The method of proportions is proposed by [17] to estimate the parameters of discrete Weibull distribution. Then, we used this method to estimate the TRTG parameters. Let X 1 , X 2 , . . . , X n be a random sample from the TRTG(q, θ) distribution. Consider the indicator function, say ν(.), which is defined (for i = 1, 2, . . . , n) as υ(X i ) denotes the proportion of zeros in the sample and estimates the probability f (0) = q(θ log q − 1) + 1. Similarly, Z denotes the proportion of ones in the sample and it estimates the probability f (1) = q 2 θ log q 2 − 1 − q(θ log(q) − 1). Hence, the proportions (MP) estimators of parameters q and θ are obtained by solving the following two equations simultaneously: and: Equations (12) and (13) can be numerically solved using the Newton-Raphson method. The solutions of the last two equations represent the MP estimators of the parameters q and θ.

Bayesian Method
To obtain the Bayes estimators of the parameters q and θ, we suppose that q has a beta distribution with parameters α 1 and β 1 and θ has a beta distribution with parameters α 2 and β 2 and q and θ are independent. Then, the prior density functions of q and θ are given by and: respectively, where β(·, ·) is the beta function. Therefore, the joint prior of (q,θ) can be expressed as Under the squared error loss function, the Bayes estimators of q and θ can be expressed aŝ L(q, θ 2 /x)π(q, θ 2 )dqd θ 2 , respectively. These estimators cannot be explicitly obtained but they can be approximately obtained by using Tierney and Kadene's (1986) method [18].

Simulation Study
To obtain information about the performance of the previous estimators, we conducted an appropriate simulation study. In this simulation, we generated samples of size n = 100, 200, 300, 400, 500, 1000 from the TRTG(q, θ) distribution and then computed the ML, MM, MP, and Bayes estimates of q and θ. We calculated the average absolute biases (ABBs), mean square errors (MSEs), and mean relative errors of the estimates (MREs) for all methods. The ABBs, MREs, and MSEs are calculated by The optim-CG routine in the R program were adopted to generate 5000 trials to estimate these indices of the ML, MM, MP, and Bayes estimates. Different sample sizes and two-parameter settings were considered, (θ = 0.5, q = 0.4) and (θ = 0.6, q = 0.5). The results are given in Tables 3 and 4. From Tables 3 and 4, it was concluded that the ABBs, MREs and MSEs of all estimates decrease when n increases as expected. Moreover, the Bayes, ML, and MM methods provide the best estimates in terms of performance criteria. The Bayes, ML and MM estimates are almost identical in terms of ABBs, MSEs, and MREs and they perform better than the MP estimates. Furthermore, as the sample size n increases, the ABBs and MSEs of all estimators reduce as expected.  Table 4. Simulation results of the TRTG model for θ = 0.6 and q = 0.5.

Modeling Three Actuarial Data
In this section, the TRTG distribution was fitted into three real actuarial data sets and compared with the transmuted geometric (TRAG), discrete Burr (DB), discrete Chen (DC), negative binomial (NB), geometric (G), and Poisson (P) distributions.
First data set: These data were reported in Klugman et al. [19] and represent the number of claims of automobile liability policies.
Second data set: The data were analyzed by Klugman et al. [19] and represent the number of hospitalizations per family member and year.
Third data set: These data were studied by Willmot [20] and refer to the number of automobile insurance claims per policy in two portfolios from Belgium during the period 1975-1976, respectively.
The TRTG, TRAG, DB, DC, NB, G, and P distributions were fitted to the three data sets, respectively. Their parameters estimates were obtained via the ML method. The chi-square procedure was adopted to test H 0 : X ∼ TRTG(q, θ). To compute the χ 2 statistic, the unknown parameters q and θ were estimated from the three data sets. Under null hypothesis, the estimated probabilities can be calculated by . . and the estimated expected frequencies areê i = nα i , whereα is an ML estimate of α. For the three data sets, the chi-square test, χ 2 , was computed for the TRTG distribution and other competing distributions. The results of observed and expected frequencies, χ 2 and − n are listed in Tables 5-7 for the three data sets, respectively. The values in these tables reveal that the TRTG distribution has the lowest values for χ 2 and − n among all competing discrete models and it provides a better fit for the given data sets than the TRAG, DB, DC, NB, G, and P distributions. Based on the results, we cannot reject H 0 at the α = 0.05 significance level.
Furthermore, for visual comparisons, the observed and fitted distributions are displayed in Figures 3-5 for the three data sets, respectively.

TRTG Count Regression Model
Let X be the response variable and y be its associated p × 1 vector of covariates. Assume that the response variable X follows the TRTG distribution with the mean µ(y). Furthermore, the mean of the response variable linked with the explanatory variables by loglinear form, i.e., µ i = exp β y T i , where β = β 1 , β 2 , . . . , β p and y i = 1, y 1i , y 2i , . . . , y pi . by replacing θ with , we obtain the re-parameterized pmf of Equation (6) as The corresponding log-likelihood equation takes the form where ψ(q, (14) is not in closed form and it cannot be solved explicitly. Some numerical methods can be used to achieve solutions. We illustrate the application of TRTG regression model by analyzing a real data about the count of infected blood cells (per mm2) on microscope slides prepared from n = 511 randomly selected individuals [21]. The response variable (x i : count of infected blood cells) was related to the following explanatory variables: the smoking status of the subject (y i1 : 0: yes; 1: no), and their sex (y i2 : 0: female; 1: male). Based on Table 8, we can strongly conclude that the proposed TRTG count regression model outperforms the Poisson, geometric, and PL regression models. The log-likelihood ˆ max and Akaike information criteria (AIC) are also reported in Table 8.

Conclusions
We derived and studied a new discrete distribution which was defined on N using the transmuted record type approach to extend the geometric distribution. The new model is called transmuted record type geometric (TRTG) distribution and it is suitable for over-dispersed count data. We introduced the distributional properties of the TRTG distribution along with two actuarial or risk measures. The TRTG parameters are discussed by four different estimation methods. A Monte Carlo simulation study was conducted to investigate the efficiency of different estimators. A new count regression model was proposed as an alternative count regression model for Poisson, geometric, and Poisson-Lindley regression models. In summary, the TRTG model can be considered as a good alternative to the negative binomial, geometric, and Poisson distributions. The TRTG distribution can be used to model insurance data as compared to the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions. The results show that the TRTG distribution outperforms some important discrete models.