An Over and Underdispersed Biparametric Extension of the Waring Distribution

A new discrete distribution for count data called extended biparametric Waring (EBW) distribution is developed. Its name is related to the fact that, in a specific configuration of its parameters, it can be seen as a biparametric version of the univariate generalized Waring (UGW) distribution, a well-known model for the variance decomposition into three components: randomness, liability and proneness. Unlike the UGW distribution, the EBW can model both overdispersed and underdispersed data sets. In fact, the EBW distribution is a particular case of a UWG distribution when its first parameter is positive; otherwise, it is a particular case of a Complex Triparametric Pearson (CTP) distribution. Hence, this new model inherits most of their properties and, moreover, it helps to solve the identification problem in the variance components of the UGW model. We compare the EBW with the UGW by a simulation study, but also with other over and underdispersed distributions through the Kullback-Leibler divergence. Additionally, we have carried out a simulation study in order to analyse the properties of the maximum likelihood parameter estimates. Finally, some application examples are included which show that the proposed model provides similar or even better results than other models, but with fewer parameters.


Introduction
The univariate generalized Waring (UGW) is a triparametric distribution for overdispersed count data that has been studied by [1][2][3][4], among others. The interest of the UGW distribution lies in the decomposition of its variance into three components, randomness, liability and proneness, which allows us to get a deeper knowledge of the nature of data variability, that is, how and why data vary. Whereas the Poisson distribution provides the simplest answer to this issue (pure chance), any one-step Poisson mixture distributions assume that there are only two sources of variability (for example, the negative binomial or NB distribution which is a Poisson-Gamma mixture).
For this reason, the UGW distribution and the related regression model [5][6][7] have been widely applied for modelling overdispersed count data sets in different fields, such as lexicology [8], the number of authors in scientific articles [9], the evolution of the number of links in the World Wide Web [10], accident theory [11], clustered data [12], sources of variance in motor vehicle crash analysis [13], completeness errors in geographic data sets [14] or agriculture [15].
However, the UGW distribution has a serious drawback related to the variance decomposition. Since its first two parameters are interchangeable in the expression of the probability mass function (pmf), it is difficult to determine which component refers to liability or proneness. There are in the literature some suggestions available to avoid this problem. Ref. [1] recommends choosing the values of liability and proneness according to the researcher experience; Ref. [11] proposes the calculus of a bivariate version of the Waring distribution and [5] solves the problem using additional information provided by covariates through a regression model. In all cases the solution of the indetermination needs external information that is not always available.
Several extensions have been developed, such as the extended Waring distribution or GHDI [4], the Stuttering generalized Waring distribution [16] and the bivariate generalized Waring distribution [11], but they do not manage to solve the identification problem aforementioned.
In this paper, we study a specific biparametric distribution within the Gaussian hypergeometric distributions (GHD) family [17] and we propose it as an extension of the UGW distribution but with only two parameters. The proposed model does not only perform similar to the UGW distribution for overdispersed data sets but also solves the identification problem of the variance components. Moreover, the way in which the extension is carried out also allows for modelling underdispersed count datasets, since it can be seen as a particular case of a complex triparametric Pearson (CTP) distribution [18,19] although, in this case, the result of decomposition of the variance is not verified because the model cannot be expressed as a Poisson mixture. Thus, this extension-that we will call extended biparametric Waring (henceforward EBW) distribution-inherits the good properties of the UGW and CTP distributions.
The rest of the paper is laid out as follows. Section 2 is devoted to defining the EBW distribution and to exploring its main probabilistic properties. In Section 3 some estimation methods are described and the properties of the maximum likelihood estimators are analised by a simulation study. In Section 4 we compare the EBW distribution with some other biparametric over-and underdispersed distributions. Some examples of application to real over-and underdispersed data that illustrate the versatility of the proposed model are included in Section 5. Finally, in Section 6, some conclusions of the current research are presented.

Definition
The GHD family, generated by the 2 F 1 (α, β; γ, λ) function , α, β ∈ C and γ, λ ∈ R, arises as a solution of the difference equation where G and L are quadratic polynomials with real coefficients, G(x) = (γ + x)(x + 1) and [20]. When convergence, positivity and normalization conditions are verified, the solution f (x) is the pmf of as a discrete distribution, that is and the probability generating function (pgf) It is important to point out that the first three parameters of the GHD are the roots of the polynomials L(x) and G(x) (except the sign of γ).
A thorough classification of the GHD family in terms of the parameters can be seen in [4]. In the aforementioned paper, a detailed study of the GHD when α, β and γ are positive real numbers and 0 < λ ≤ 1 (denoted by type I) is made. The case when α and β are conjugate complex numbers, γ > 0 and 0 < λ ≤ 1 (denoted by type II distributions) has been studied in [18,19,21]. Type VII distributions, a finite case which may be seen as a generalization of the beta-binomial model, have been addressed by [22]. Likewise, the case when λ = 1 has been analysed by [23,24], among others.
In this paper, we focus on the case in which both GHD of type I and type II with λ = 1 converge. Thus, L(x) in Equation (1) has a real double root, that is, α = β. Then, the solution of Equation (1) is given in terms of a 2 F 1 (α, α; γ; 1) function leading to a highly versatile biparametric discrete distribution with infinite range which is formalized in the following definition. From now on we will call it EBW, the acronym of Extended Bivariate Waring, distribution. Later on we will explain the nomenclature chosen for this new distribution.
The mean, µ, and variance, σ 2 , of X are so it is necessary that γ > 2α + 1 and γ > 2α + 2 to guarantee the existence of µ and σ 2 , respectively. In general, it can be proved that γ > 2α + m to guarantee the existence of the m-th raw moment.

Properties
To study the properties of the EBW distribution we will distinguish among α > 0 and It is necessary that γ > 2α, so we consider another parametrization of the distribution in terms of α and ρ = γ − 2α > 0. Then, the expression of the pmf given in Equation (3) is now and the expressions in Equation (4) reduce to To guarantee the existence of the m-th raw moment it is necessary that ρ > m.
Proof. Considering α = β > 0 and λ = 1 in Equation (2) and applying that it is easy to see that the pmf given in Equation (5) coincides with that of a UGW(α, α, ρ) distribution.
Hence, our model may be seen as a biparametric case of a UGW distribution when α > 0. As a consequence, it inherits the properties of the UGW distribution which are listed below: 1. It can be obtained from a two-step Poisson mixture: Since the EBW distribution with α > 0 is a Poisson mixture, it is always overdispersed. 3. It converges to P (µ) when ρ and α 2 → ∞ with the same order of convergence. 4. As a consequence of the mixture, the variance of X can be split into three components known as randomness, liability and proneness, respectively: Since we have got rid of one of the first two parameters of the UGW distribution, the indetermination problem with regard to the components of the variance [4] disappears in the biparametric model and, therefore, it is not necessary to provide additional information when determining the partition of the variance.
In order to know the effect of each parameter on the variance components of the EBW model we consider the proportion of variance explained by each one, that is: Figure 1 shows the evolution of the variance partition percentages for each component considering α fixed and ρ variable (low and high values) and then, ρ fixed and α variable (low and high values). In the first column (α fixed), we can observe that the greater ρ is, the more important is the proneness. In the second column (ρ fixed), the greater α is, the more important is the randomness. Otherwise, if α and ρ increase with the same convergence order, the proneness has a lower limit in 50% of the variance, whereas the other two parts tend to 25% each one.  Due to the structure of the UGW distribution in which the first two parameters are interchangeable and appear in a multiplicative form in the pmf, moments and decomposition of the variance, the maximum likelihood estimates of its first two parameters are usually almost equal. In fact, given a UGW(a, k, ρ) distribution, there exists an EBW(α, ρ) distribution with α = √ ak and the same parameter ρ, that is very close to the former. This can be seen in Figure 2 where the maximum Kullback-Leibler (KL) divergence for more details see [25] between the two models has been calculated for several values of a, k and ρ. We have considered the same scale in all the graphs in order to compare them. We observe that: (1) the divergence increases as k is separated from a; (2) the difference between a and k is less relevant as ρ increases; (3) in any case, the divergence is very small. Hence, the EBW type I distribution has the property of providing in many cases a similar fit but with one more degree of freedom. In general, the EBW distribution is able to provide acceptable fits for data simulated from a UGW distribution. This implies that, in most cases, there exists a EBW model reasonably similar to the UGW, but with the advantage of having fewer parameters. To show this fact we have simulated M = 1000 samples of size N = 100, 300 and 500 from a UGW distribution with several values of its parameters and, for each sample, we have obtained the corresponding EBW and UGW fits. All the estimates have been computed by the maximum likelihood method. We have implemented our own functions in R [26]: the pmf of the EBW and UGW distributions and the fitting function for both models. To do the latter, we have used the optim function of the stats package with the L-BFGS-B method [27], since it allows box constraints, and considering as initial values the estimates provided by the method of moments (see Section 3 for more details). For each group of 1000 samples we have computed the percentage of EBW fits achieved as well as the percentage of these fits which are better than the corresponding UGW fit in two senses: the AIC value and the χ 2 -goodness of fit test. Specifically, for the EBW fits achieved we have computed the percentage of them whose AIC value is less than that of the UGW fit and the percentage of p-values in the χ 2 -goodness of fit test greater than 0.01, 0.05 and 0.1 (that is, the null hypothesis data comes from a EBW model cannot be rejected). These results appear in Tables 1 and 2. We have also carried out the Kolmogorov-Smirnov test for discontinuous distributions [28] using the ks.test function of the dgof package in R, but in all the cases the p-values are greater than 0.1.  It can be seen as a particular case of a CTP distribution [18,21], which arises when the polynomial L(x) in Equation (1) has conjugate complex roots α = a + ib and β = a − ib; specifically, we have the following result.
This result is also true when α > 0. So, a UGW(α, α, ρ) ≡ CTP(α, 0, 2α + ρ). At this point we can justify the name chosen for the model proposed. On the one hand, when α > 0 the model may be seen as a biparametric case of the UGW distribution, which is always overdispersed and that may replace it with fewer parameters; on the other hand, when α < 0 the model can be underdispersed, so it may be considered as an underdispersed extension of a biparametric UGW distribution.

It converges to the:
• P (µ) when γ and α 2 → ∞ with the same order of convergence. • Normal distribution, N (µ, σ), when γ and α have the same order of convergence. The CTP distribution cannot be expressed as a mixture, so in the EBW with α < 0 there is no a result of variance decomposition.

Methods for Obtaining Estimators
We can estimate the two parameters of the EBW distribution using the method of moments and the maximum likelihood estimation method.
To apply the method of moments, we first solve the equations given in Equation (4). To this end we substitute γ − 2α − 1 = α 2 /µ in the equation of σ 2 . Then, which is equivalent to α 2 (σ 2 − µ) − 2µ 2 α − µ(µ 2 + σ 2 ) = 0. Then, replacing µ and σ 2 by their sample counterparts, x and s 2 , and solving the equation there are two possible estimates for α by the method of moments: It is clear that if data exhibit overdispersion, then α 1 > 0 and α 2 < 0. On the other hand, if data are underdispersed both α 1 and α 2 are negative. Estimated α, the estimate of γ is calculated as γ = α 2 /x + 2 α + 1. Hence, there are also two possible estimates for γ with the only restriction of being positive, which it is true when: Using the MLE method we have to maximize the log-likelihood function. Thus, if x = (x 1 , . . . , x n ) is a sample of size n, the expression of the log-likelihood function is: when α > 0, using the parametrization given in Section 2.2.1, or in another case. Both expressions can be maximized using numerical methods. In particular, we have used the L − BFGS − B method implemented in the optim function of the MASS package in R. This method allows box constraints on the parametric space, so we can impose ρ > 0 or γ > 0 in Equations (8) and (9), respectively. We consider the estimates obtained by the method of moments as initial values, in such a way that we maximize Equation (8) if α > 0 or Equation (9) in another case.

Properties of the Estimators
We have carried out a simulation study in order to analyse the performance of the estimates of the model parameters. Specifically, we have simulated M = 1000 samples of size N = 500 of the EBW distribution and we have fitted the EBW model for each sample using the MLE method described in the previous section.
We have considered two scenarios: α > 0, in which case the EBW distribution is always overdispersed, and α < 0, in which case the EBW distribution can be under-and overdispersed. In all cases the values of the parameters satisfy the conditions for the existence of µ and σ 2 .
Results of the simulation procedure are shown in Table 3. Thus, Column 1 contains the mean bias and the s.d., in brackets (* indicates a significant bias at 5% level based on a normal 95% confidence interval, given that there are 1000 observations). Column 2 shows the average of the mean square error (MSE) of the parameter estimates and Column 3 the percentage of simulations in which the parameter estimate does not differ significantly at 5% from the true value, known as coverage.
We have only included low values of ρ and γ because the higher these values are compared with α, the lower the mean and the variance are. In fact, if these parameters tend to infinity, holding α fixed, the EBW distribution degenerates into 0. In addition, if both α and ρ (or γ) are high, the EBW is similar to the Poisson or the Normal distribution.
In general, we can deduce that: • If α > 0 the estimates are biased to the right, but the bias decreases as α increases, holding ρ fixed. The opposite happens with the bias of ρ, which increases as ρ increases. • If α < 0 the estimates are also biased, those for α to the left and for γ to the right, but the bias disappears as α decreases (α < −1). Holding γ fixed, the bias decreases as α decreases and the same happens for γ.

•
The average MSE is low for both parameter estimates, although this measure increases as ρ (or γ) increases since the estimates accuracy and precision decrease. • Regarding the coverage, it approaches 95%, the confidence level considered, so it shows the validity of the inference made.

Comparison with Other Count Data Distributions
Next we study the differences and similarities between the EBW and other wellknown biparametric discrete distributions for count data using again the KL divergence. Specifically, we consider the distributions NB, Complex Biparametric Pearson or CBP [19], which is a particular case of the CTP distribution, Generalized Poisson or GP [29,30], COM-Poisson or CMP [31,32] and Hyper-Poisson or HP [33]. The first two are suitable only for overdispersed data, whereas the other three can cope with both underdispersed and overdispersed data, although the GP has finite range in the underdispersed case.
We focus on the overdispersed scenario since the underdispersed one, for being the EBW distribution a particular case of the CTP distribution, was already carried out by [21].
To compute the KL divergence between the EBW distribution and the above-mentioned distributions (and vice versa), we have considered several values of µ and σ 2 , with σ 2 > µ, and then we have obtained the corresponding values of the parameters of each distribution (see Appendix A). For the CMP and HP distributions it should be taken into account that not all the combinations of µ and σ 2 are possible; empirically there seems to be an upper limit for σ 2 in µ(µ + 1). Thus, the values of the KL divergence are shown in Figure 3.
In general, we can observe that in an overdispersed scenario the most distant models from the EBW distribution are the CBP and HP distributions and the closest ones to the EBW distribution are the GP and NB distributions. On the other hand, in an underdispersed scenario the HP distribution, which is very similar to the CMP distribution, is the closest one [21]. Nevertheless, these distances in relation to the EBW distribution are really small, which implies that the performance of these distributions is very similar.

Examples
In this section we use the EBW distribution to fit both over-and underdispersed real data and we compare this fit with those obtained from other discrete distributions.

Overdispersed Data: Number of Some Sports Facilities by Municipality in Andalusia, 2015
We consider the variable X: number of some sports facilities by municipality in Andalusia, in 2015. Data have been directly obtained from the System of Multiterritorial Information of Andalusia (SIMA) of the Junta de Andalucía [34]. This category includes all sports facilities in the municipality except sports complexes, sports courts, pelota courts (frontons) and pools. A description of these data appears in Table 4, which contains the mean, variance, Aggregation Index (AI), quartiles and maximum. We will model these data by the following distributions: EBW, NB, GP, CBP, UGW and CMP. AIC values, statistics and p-values corresponding to the χ 2 -goodness of fit test are shown in Table 5. We can see that the best fit is that provided by the EBW distribution. The Wald test supports this statement since the null hypothesis a = k (the first two parameters of the UGW distribution are equal) cannot be rejected: the statistic value is −1.31 × 10 −6 and the corresponding p-value is 1. With the likelihood ratio test (LRT) we come to the same conclusion (LRT = 2.36 × 10 −10 and p-value 1).  Table 6 shows the observed and expected frequencies for the EBW, NB, GP and CBP fits. Figure 4 shows graphically the frequencies for the values between 0 and 10 sport facilities. We can see that, in general, the EBW distribution fit is really accurate (the greater Pearson residual is 1.55 for the interval [18][19][20][21]. In the other side, the remaining distributions considered provide worse fits, in special in reference to the lowest values of the variable (high Pearson residuals in 0, 1 and 2).
Additionally, we can calculate the three components of the variance. The percentage of data variability due to randomness, liability and proneness is 14.25%, 32.83% and 52.92%, respectively. We can observe that randomness does not play a very important role with respect to the total variability of data and that the most important component is proneness, which refers to specific and internal conditions instead of general conditions of the municipality (external), although liability is also remarkable. The idiosyncrasy of a municipality explains more than 50% of the variability in the number of some sports facilities, whereas shared conditions have less influence, but also noteworthy, on this variability.

Underdispersed Data: Turkish Poem
We consider data about the word length (in terms of number of syllables) in the turkish poem Gidisat by Ercüment Behzat Lâv available in [35]. Following these authors, the count for 1 is treated as a count for 0, and in general the count for the response variable X is treated as X − 1, as though the data are generated by adding 1 to the distribution. These data exhibit underdispersion with a variance-mean ratio of 0.74 (see Table 4). Table 7 contains the parameter estimates, their standard errors (in parenthesis), the AIC, the observed and expected frequencies and the corresponding Pearson χ 2 test for each one of the models that copes with underdispersion, that is, EBW, CTP, CMP and HP (the GP distribution has been excluded because it is of finite range).
CTP and EBW fits provide practically the same results. In fact, b = 0 using the Wald test (z exp = 2.3 × 10 −5 and p-value ≈ 1) and the LRT (χ 2 exp = 0 and p-value = 1). Observed and expected frequencies for each fit are represented in Figure 5 (the CTP distribution has been suppressed). Although the three fits are very similar and really good, the EBW distribution fit is the most accurate taking into account the expected frequencies.

Conclusions
The EBW distribution is a very flexible biparametric discrete distribution that allows for modelling a wide variety of over and underdispersed count datasets. There are other biparametric distributions that can also cope with over and underdispersion such as the GP, CMP or HP distributions, but the EBW distribution is more general because its pmf and moments can be explicitly obtained in terms of the parameters. In this paper we have proposed this new model to fits data from different fields of knowledge, that shows the versatility of this model in respect with its possible application. In addition, when the first parameter of the EBW distribution is positive, it allows to split the variance into three uniquely determined components. This property avoids the problem of indeterminacy present in the UGW distribution. In consequence, and taking into account this property, the EBW distribution is more adequate than other biparametric discrete distributions for modelling overdispersed data in which the non-random part of the variance has two components, none of them negligible.
Furthermore, when the first parameter is a negative integer the EBW distribution has finite range and it is underdispersed. Something similar happens with the GP distribution that also has finite range but only in the underdispersed case, whereas the EBW distribution can also be underdispersed with infinite range.  Data Availability Statement: Data of example 1 is available in http://www.juntadeandalucia.es/ institutodeestadisticaycartografia/sima/index2.htm and the example 2 is included in the article.

Conflicts of Interest:
The authors declare no conflict of interest.