Abstract
In this paper, a new modified version of geometric distribution is proposed. The newly introduced model is called transmuted record type geometric (TRTG) distribution. TRTG distribution is a good alternative to the negative binomial, Poisson and geometric distributions in modeling real data encountered in several applied fields. The main statistical properties of the new distribution were obtained. We determined the measures of value at risk and tail value at risk for the TRTG distribution. These measures are important quantities in actuarial sciences for portfolio optimization under uncertainty. The TRTG parameters were estimated via maximum likelihood, moments, proportions, and Bayesian estimation methods, and the simulation results were determined to explore their performance. Furthermore, a new count regression model based on the TRTG distribution was proposed. Four real data applications were adopted to illustrate the applicability of the TRTG distribution and its count regression model. These applications showed empirically that the TRTG distribution outperforms some important discrete models such as the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions.
1. Introduction
Discrete models are very important in handling count data encountered in several theoretical and applied sciences such as medicine, insurance, life testing, biology, and agriculture. Recently, there has been an increased interest among statisticians to construct new flexible discrete distributions. Chakraborty and Chakravarty [1] mentioned that almost all observed values are actually discrete because they are measured to only a finite number of decimal places and cannot really constitute all points in a continuum.
On the other hand, in some life testing and survival analysis studies, lifetimes can be treated as a discrete random variable and hence its reliability function is a function of a discrete random variable. For example, the reliability of a switching device is a function of the number of times the switch is operated or the reliability of a computer is a function of the number of time the computer has broken down. Recently, many continuous lifetime distributions have been discretized for modeling discrete lifetime data. For example, discrete Weibull by [2], discrete Burr and discrete Pareto distributions by [3].
Furthermore, some discrete distributions have been proposed by compounding two discrete distributions—for example, the uniform-Poisson distribution by [4], uniform-geometric distribution by [5], and binomial discrete-Lindley distribution by [6]. Recently, Al-Babtain et al. [7] proposed a natural discrete analog of the continuous-Lindley distribution as a mixture of negative binomial and geometric distributions.
In recent decades, several works have been introduced in the statistical literature to discretize continuous distributions. However, there is still a clear need to introduce a more flexible discrete lifetime distributions to model several types of count data in many applied areas including insurance, social sciences, reliability studies, and economics. It is worth noting that the probability mass functions (pmfs) of most recently introduced discrete distributions are developed by discretizing the continuous survival functions of continuous distributions and have quite a complex structure in terms of their parameter estimation—for example, the two discrete Lindley models by [8,9].
Apart from discretization techniques, we were motivated to propose a more flexible extension of the geometric distribution using a transmuted record type (TRT) method due to [10]. The TRT approach can be summarized as follows. Let be a random sample from a distribution having cdf . Let and be the first and second upper records, respectively, based on this sample. Consider a random variable X that is defined as follows:
Hence, the cdf of X follows as
where .
According to Equation (3.1) of [11], the cdf of the first upper record, , reduces to:
The cdf of the second upper record, , takes the form:
After some algebra, the cdf of X reduces to:
where is any baseline cdf and Equation (4) is referred as the cdf of the TRT method. Tanış and Saraçoğlu [12] constructed the TRT-Weibull distribution using the TRT approach. Further information about the TRT approach can be explored in [10].
To the best of our knowledge, this is the first article that applies the TRT method to construct an extended form of geometric distribution. The proposed discrete model is called transmuted record type geometric (TRTG) distribution and it is suitable for over-dispersed data and hence can be applied in collective risk models and can be considered a competitive distribution to the negative binomial and Poisson distributions for fitting automobile claim frequency data.
Additionally, we derive explicit expressions for its basic distributional properties including moment generating and probability generating functions, mean, variance, skewness, kurtosis quantile function, stochastic orders, mean deviation, and mean residual life. In addition, we derived two important risk measures, namely the value at risk and tail value at risk for the TRTG model. The TRTG parameters were estimated via the maximum likelihood, moments, proportions, and Bayesian estimation methods. The simulation results were determined to explore their performance in estimating the TRTG parameters and q. The applicability of the TRTG distribution was studied by three data sets from the actuarial sciences showing its superiority as compared with competing models, namely transmuted geometric [13], discrete Burr [3], discrete Chen [14], negative binomial, geometric, and Poisson distributions. We were also motivated to propose a count regression model based on the TRTG distribution. The new TRTG regression model outperformed the Poisson, geometric, and Poisson–Lindey (PL) [15] regression models.
The rest of the paper is organized as follows. We define the TRTG distribution in Section 2. Some of its distributional properties are provided in Section 3. In Section 4, we derive two important risk measures of the TRTG model and provide some numerical computations for them. The TRTG parameters, and q, are estimated via four estimation methods in Section 5. In Section 6, a Monte Carlo simulation study is conducted to investigate the efficiency of different proposed estimates. In Section 7, we analyze three insurance data sets to illustrate the flexibility of the TRTG model. The TRTG count regression model is discussed in modeling real life count data in Section 8. Finally, the paper is concluded in Section 9.
2. The TRTG Distribution
First, we applied the TRT methodology to propose the two-parameter TRTG distribution as an extended version of the geometric distribution with pmf, , and a cumulative distribution function (cdf), .
By inserting the cdf, the geometric distribution in Equation (4), we obtain the cdf of the TRTG distribution as follows:
The TRTG distribution is specified by the following pmf:
where . If X has the pmf (6), then it is denoted by .
The survival function of the TRTG model is specified by
It is clear that In other words, the TRTG distribution behaves like a geometric model when lies around zero.
Consequently, the hazard function (hf) of X reduces to:
Figure 1 presents the plots of the pmf of the TRTG model for some choices of q and . Figure 1 shows that the probabilities can only be decreasing or increasing–decreasing-shaped. Furthermore, it is observed that as increases, in most diagrams, the mode moves to the right, showing that the TRTG model is so versatile and that small values of have a substantial effect on the TRTG distribution. Figure 2 displays the hf plots of the TRTG model for some choices of q and and it reveals that the TRTG model has a decreasing discrete hazard rate.
Figure 1.
Plots of the pmf of the TRTG distribution for some choices of q and .
Figure 2.
Plots of the hf of the TRTG distribution for some choices of q and .
3. Distributional Properties
3.1. Moments and Quantile Function
The moment generating function of the TRTG distribution takes the form:
Note that by using , we can obtain the probability generating function of the TRTG distribution as follows:
Hence, the first fourth moments of X can be derived as
and:
Then, the variance, skewness, and kurtosis of X are given by
and:
The quantile function (qf) of the TRTG distribution is derived as
where W refers to the Lambert function. From Equation (9), the quantile, , of TRTG distribution is written by
where denotes the integer part of x. That is, satisfies where F is the cdf (5) of the TRTG distribution. The median of the TRTG distribution follows by simply equating a with .
The most important measure of any discrete distribution is the dispersion index (DI) which is defined as . Some statistical measures of the TRTG distribution are computed and reported in Table 1. To interpret the individual effects of the parameters and q, the results are calculated for fixed and . As seen from Table 1, the mean, variance, and DI are increasing functions of the parameter q for fixed . In addition, the mean and variance are increasing functions of for fixed . The DI decreases when the parameter increases. Furthermore, the results show that the TRTG distribution is suitable for over-dispersed count data.
Table 1.
Some numerical measures of the TRTG model for and .
3.2. Stochastic Orders
Shaked and Shanthikumar [16] illustrated that several stochastic orders exist and have many applications. Stochastic orders are important measures to judge comparative behaviors of random variables.
The following theorem illustrates that the TRTG distribution is ordered according to the likelihood ratio () order as the strongest stochastic order.
Theorem 1.
If X∼TRTG and TRTG. Then X for all .
Proof.
The pmf of X can be expressed as
The density ratio of TRTG distribution, say , is obtained in two parts, as and . If the two ratios, and , are decreasing functions in x, then the density ratio, , is also a decreasing function of x. Then, and can be expressed as
and:
Firstly, the first derivative of with respect to x is given by
where for . Hence, for .
Similarly, the first derivative of with respect to x has the form:
where for . Then, for . It is seen that both density ratios, and , are decreasing functions in x. Hence, is also a decreasing function in x. The proof is completed. □
Based on the chain of stochastic orders (see, [16] and Definition 4 in [7]), we conclude that the TRTG distribution can be ordered according to the hazard rate (), reversed hazard rate (), mean residual life (), and stochastic () orders. That is, X , X , X , and X .
3.3. Mean Deviation and Mean Residual Life
The mean deviation () of the TRTG model is derived as
The of the TRTG model is defined by
4. Actuarial Measures
In this section, we determined the value at risk (VaR) and tail value at risk (TVaR) measures of the TRTG distribution.
4.1. VaR Measure of the TRTG Distribution
Let X denote a loss random variable. The VaR of X at the level, denoted by , is the percentile (or quantile) of the distribution of X. Hence, the VaR of the TRTG model is defined by
where and F is the cdf of the TRTG distribution given in (5). The VaR of the TRTG distribution with qf (9) is derived as
4.2. TVaR Measure of the TRTG Distribution
Let X denote a loss random variable. The TVaR of X at the security level, denoted by TVaR is the expected loss given that the loss exceeds the percentile of the distribution of X. For engineering or actuarial applications, it is more common to consider the distribution of losses—in this case the right-tail TVaR is considered (typically for or ). The TVAR is defined by
Using the pmf and cdf of the TRTG model, the TVaR is derived as
Table 2 provides some numerical computations for the VaR and TVaR measures of the TRTG distribution for different parametric values.
Table 2.
Numerical values of VaR and TVaR measures of the TRTG distribution.
5. Estimation
In this section, the estimation of the TRTG parameters is examined using some classical and Bayesian methods.
5.1. Method of Maximum Likelihood
Let be the observations of n independent and identically random variables from the TRTG distribution. Then, the corresponding log-likelihood function reduces to:
Then, the maximum likelihood (ML) estimators of q and , say and , are the solution of the following linear equations: and . The ML estimators of q and cannot be obtained explicitly. Therefore, they can be obtained by numerical methods. The fminsearch command in Matlab is used for this purpose.
5.2. Method of Moments
The moments (MM) estimators of the parameters and were obtained by simultaneously solving the following two equations:
and:
5.3. Method of Proportions
The method of proportions is proposed by [17] to estimate the parameters of discrete Weibull distribution. Then, we used this method to estimate the TRTG parameters. Let be a random sample from the TRTG distribution. Consider the indicator function, say , which is defined (for ) as
Then, denotes the proportion of zeros in the sample and estimates the probability . Similarly, Z denotes the proportion of ones in the sample and it estimates the probability . Hence, the proportions (MP) estimators of parameters and are obtained by solving the following two equations simultaneously:
and:
5.4. Bayesian Method
To obtain the Bayes estimators of the parameters q and , we suppose that q has a beta distribution with parameters and and has a beta distribution with parameters and and q and are independent. Then, the prior density functions of q and are given by
and:
respectively, where is the beta function. Therefore, the joint prior of can be expressed as
Under the squared error loss function, the Bayes estimators of q and can be expressed as
respectively. These estimators cannot be explicitly obtained but they can be approximately obtained by using Tierney and Kadene’s (1986) method [18].
6. Simulation Study
To obtain information about the performance of the previous estimators, we conducted an appropriate simulation study. In this simulation, we generated samples of size from the TRTG distribution and then computed the ML, MM, MP, and Bayes estimates of q and . We calculated the average absolute biases (ABBs), mean square errors (MSEs), and mean relative errors of the estimates (MREs) for all methods. The ABBs, MREs, and MSEs are calculated by
and:
where and .
The optim-CG routine in the R program were adopted to generate 5000 trials to estimate these indices of the ML, MM, MP, and Bayes estimates. Different sample sizes and two-parameter settings were considered, and . The results are given in Table 3 and Table 4. From Table 3 and Table 4, it was concluded that the ABBs, MREs and MSEs of all estimates decrease when n increases as expected. Moreover, the Bayes, ML, and MM methods provide the best estimates in terms of performance criteria. The Bayes, ML and MM estimates are almost identical in terms of ABBs, MSEs, and MREs and they perform better than the MP estimates. Furthermore, as the sample size n increases, the ABBs and MSEs of all estimators reduce as expected.
Table 3.
Simulation results of the TRTG model for and .
Table 4.
Simulation results of the TRTG model for and .
7. Modeling Three Actuarial Data
In this section, the TRTG distribution was fitted into three real actuarial data sets and compared with the transmuted geometric (TRAG), discrete Burr (DB), discrete Chen (DC), negative binomial (NB), geometric (G), and Poisson (P) distributions.
First data set: These data were reported in Klugman et al. [19] and represent the number of claims of automobile liability policies.
Second data set: The data were analyzed by Klugman et al. [19] and represent the number of hospitalizations per family member and year.
Third data set: These data were studied by Willmot [20] and refer to the number of automobile insurance claims per policy in two portfolios from Belgium during the period 1975–1976, respectively.
The TRTG, TRAG, DB, DC, NB, G, and P distributions were fitted to the three data sets, respectively. Their parameters estimates were obtained via the ML method. The chi-square procedure was adopted to test TRTG. To compute the statistic, the unknown parameters q and were estimated from the three data sets. Under null hypothesis, the estimated probabilities can be calculated by
and the estimated expected frequencies are where is an ML estimate of . For the three data sets, the chi-square test, , was computed for the TRTG distribution and other competing distributions. The results of observed and expected frequencies, and are listed in Table 5, Table 6 and Table 7 for the three data sets, respectively. The values in these tables reveal that the TRTG distribution has the lowest values for and among all competing discrete models and it provides a better fit for the given data sets than the TRAG, DB, DC, NB, G, and P distributions. Based on the results, we cannot reject at the significance level.
Table 5.
Results of observed and expected frequencies, and , respectively, for first data.
Table 6.
Results of observed and expected frequencies, and , respectively, for second data.
Table 7.
Results of observed and expected frequencies, and , respectively, for third data.
Furthermore, for visual comparisons, the observed and fitted distributions are displayed in Figure 3, Figure 4 and Figure 5 for the three data sets, respectively.
Figure 3.
Observed and fitted distributions for first data.
Figure 4.
Observed and fitted distributions for second data.
Figure 5.
Observed and fitted distributions for third data.
8. TRTG Count Regression Model
Let X be the response variable and be its associated vector of covariates. Assume that the response variable X follows the TRTG distribution with the mean . Furthermore, the mean of the response variable linked with the explanatory variables by log-linear form, i.e., , where and by replacing with , we obtain the re-parameterized pmf of Equation (6) as
The corresponding log-likelihood equation takes the form
where . Equation (14) is not in closed form and it cannot be solved explicitly. Some numerical methods can be used to achieve solutions. We illustrate the application of TRTG regression model by analyzing a real data about the count of infected blood cells (per mm2) on microscope slides prepared from randomly selected individuals [21]. The response variable (: count of infected blood cells) was related to the following explanatory variables: the smoking status of the subject ( 0: yes; 1: no), and their sex ( 0: female; 1: male). Based on Table 8, we can strongly conclude that the proposed TRTG count regression model outperforms the Poisson, geometric, and PL regression models. The log-likelihood and Akaike information criteria (AIC) are also reported in Table 8.
Table 8.
Estimated parameters for the TRTG, Poisson, geometric, and PL regression models along with and AIC.
9. Conclusions
We derived and studied a new discrete distribution which was defined on using the transmuted record type approach to extend the geometric distribution. The new model is called transmuted record type geometric (TRTG) distribution and it is suitable for over-dispersed count data. We introduced the distributional properties of the TRTG distribution along with two actuarial or risk measures. The TRTG parameters are discussed by four different estimation methods. A Monte Carlo simulation study was conducted to investigate the efficiency of different estimators. A new count regression model was proposed as an alternative count regression model for Poisson, geometric, and Poisson–Lindley regression models. In summary, the TRTG model can be considered as a good alternative to the negative binomial, geometric, and Poisson distributions. The TRTG distribution can be used to model insurance data as compared to the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions. The results show that the TRTG distribution outperforms some important discrete models.
Author Contributions
Conceptualization, T.E., Y.A. and M.M.A.A.; methodology, A.Z.A.; software, Y.A.; validation, T.E., Y.A., M.M.A.S. and A.Z.A.; formal analysis, Y.A. and M.M.A.S.; writing—original draft preparation, Y.A., M.M.A.S. and A.Z.A.; writing—review and editing, T.E., Y.A. and A.Z.A.; visualization, M.M.A.A.; supervision, A.Z.A.; funding acquisition, M.M.A.A. All authors have read and agreed to the published version of the manuscript.
Funding
The first author extends their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number (GRP/106/42), Received by Mohammed M. Almazah. www.kku.edu.sa.
Acknowledgments
The authors would like to thank the Editorial Board and the two anonymous reviewers for their constructive comments that greatly improved the final version of the paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Chakaraborty, S.; Chakaraborty, D. Discrete gamma distribution: Properties and parameter estimation. Commun. Stat. Theory Methods 2012, 41, 3301–3324. [Google Scholar] [CrossRef]
- Nakagawa, T.; Osaki, S. Discrete Weibull distribution. IEEE Trans. Reliab. 1975, 24, 300–301. [Google Scholar] [CrossRef]
- Krishna, H.; Pundir, P.S. Discrete Burr and discrete Pareto distributions. Stat. Methodol. 2009, 6, 177–188. [Google Scholar] [CrossRef]
- Gómez-Déniz, E. A new discrete distribution: Properties and applications in medical care. J. Appl. Stat. 2013, 40, 2760–2770. [Google Scholar] [CrossRef]
- Akdoğan, Y.; Kuş, C.; Asgharzadeh, A.; Kınacı, I.; Shafari, F. Uniform-geometric distribution. J. Stat. Comput. Simul. 2016, 86, 1754–1770. [Google Scholar] [CrossRef]
- Kuş, C.; Akdoğan, Y.; Asgharzadeh, A.; Kınacı, I.; Karakaya, K. Binomial-discrete Lindley distribution. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. 2018, 68, 401–411. [Google Scholar] [CrossRef]
- Al-Babtain, A.A.; Ahmed, A.H.N.; Afify, A.Z. A new discrete analog of the continuous lindley distribution, with reliability applications. Entropy 2020, 22, 603. [Google Scholar] [CrossRef] [PubMed]
- Gómez-Déniz, E.; Calderın-Ojeda, E. The discrete Lindley distribution: Properties and applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
- Bakouch, H.S.; Jazi, M.A.; Nadarajah, S. A new discrete distribution. Statistics 2014, 48, 200–240. [Google Scholar] [CrossRef]
- Balakrishnan, N.; He, M. A Record-Based Transmuted Family of Distributions. In Advances in Statistics-Theory and Applications. Emerging Topics in Statistics and Biostatistics; Ghosh, I., Balakrishnan, N., Ng, H.K.T., Eds.; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Shakil, M.; Ahsanullah, M. Record values of the ratio of Rayleigh random variables. Pak. J. Stat. 2011, 27, 307–325. [Google Scholar]
- Tanış, C.; Saraçoğlu, B. On the record-based transmuted model of balakrishnan and He based on Weibull distribution. Commun. Stat.-Simul. Comput. 2020. [Google Scholar] [CrossRef]
- Chakraborty, S.; Bhati, D. Transmuted geometric distribution with applications in modeling and regression analysis of count data. SORT 2016, 40, 153–176. [Google Scholar]
- Noughabi, M.S.; Rezaei Roknabadi, A.H.; Mohtashami Borzadaran, G.R. Some discrete lifetime distributions with bathtub-shaped hazard rate functions. Qual Eng. 2013, 25, 225–236. [Google Scholar] [CrossRef]
- Sankaran, M. The discrete Poisson-Lindley distribution. Biometrics 1970, 26, 145–149. [Google Scholar] [CrossRef]
- Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar]
- Khan, M.S.A.; Khalique, A.; Abouammoh, A.M. On estimating parameters in a discrete Weibull distribution. IEEE Trans. Reliab. 1989, 38, 348–350. [Google Scholar] [CrossRef]
- Tierney, L.; Kadene, J. Accurate approximation for posterior moments and marginal densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar] [CrossRef]
- Klugman, S.; Panjer, H.; Willmot, G. Loss Models: From Data to Decisions; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 715. [Google Scholar]
- Willmot, G.E. The Poisson-inverse Gaussian distribution as an alternative to the negative binomial. Scand. Actuar. J. 1987, 1987, 113–127. [Google Scholar] [CrossRef]
- Crawley, M.J. The R Book, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).