1.  An abstract to this thesis appeared in Katz ( 1946). 
2.  This family of distributions, and extensions to it, have proved important in the actuarial modelling of claims; see, for example, Hess et al. ( 2002); Panjer ( 1981); Sundt and Jewell ( 1981); Willmot ( 1988), and Pestana and Velosa ( 2004). Johnson et al. ( 1993, chp. 2) provides an extensive discussion of both the Katz family and various other, often related, families of discrete distributions. Although, in respect of the Katz family of distributions alone, the treatment in Johnson and Kotz ( 1969, chp. 2.4) is more complete; see also Gurland ( 2006) for a more recent treatment. 
3.  The one caveat to this observation is that the use of higher order moments may provide some power against models which share low order moments, thereby creating a class of implicit null hypotheses ( Davidson and MacKinnon 1987). 
4.  Numerous extensions soon followed; see, for example, Bardwell and Crow ( 1964); Crow and Bardwell ( 1965); Ord ( 1967a, 1967b); Staff ( 1964, 1967) and Kemp ( 1968). Here we only briefly sketch some key ideas. For a more complete treatment of such families of distributions see, for example, any of Johnson et al. ( 1993, chp. 2.3), Ord ( 1972, chp. 5), or Dacey ( 1972). 
5.  Observe that the Pochhammer symbol ${\left(r\right)}_{y}=\mathsf{\Gamma}(y+r)/\mathsf{\Gamma}\left(r\right)$, where y is a nonnegative integer. Note that r can be negative. If r is a negative integer then ${\left(r\right)}_{y}=0$ for all $y>r$. If r is a positive integer then ${\left(r\right)}_{y}=(y+r1)!/(r1)!$. 
6.  When $\lambda /\gamma $ is integer the resulting pmfs are sometimes referred to as those of Pascal distributions, with the term negative binomial reserved for the more general case of $\lambda /\gamma $ not necessarily integer. 
7.  Similarly, the Poisson approximation to the Binomial reduces to $\gamma \to {0}^{}$ for fixed $\lambda $, which is also a more intuitive statement of how parameters must evolve for the approximation to work than is typically encountered. 
8.  We shall persist with the abuse of notation inherent in expressions like $\mathrm{E}\left[{Y}_{i}\mid {\lambda}_{i}\right]$ rather than, say, a more complete notation along the lines of $\mathrm{E}\left[{Y}_{i}\mid \beta ;{x}_{i}\right]$, for the sake of the notational economy it affords. 
9.  
10.  Common variants of this argument include: (i) Lee ( 1986), who specifies the gamma distribution in terms of the shape and scale (or inverse rate) ( $\xi =1/\eta $) parameters, that is, $\theta \sim \mathcal{G}(1/\xi ,\tau )$, and (ii) Cameron and Trivedi ( 1986), who use the socalled index form of the gamma distribution, which is specified in terms of the shape and mean ( $\varphi =\tau /\eta $) parameters, that is, $\theta \sim \mathcal{G}(\tau /\varphi ,\tau )$. Cameron and Trivedi ( 1986) call the shape parameter ( $\tau $) the index or precision parameter. 
11.  Moments for the gamma distribution specifications given in Footnote 10 follow immediately on making the appropriate substitution for $\eta $. 
12.  Other values of k yield the Negbin P, or NBP, model ( Greene 2008). 
13.  Strictly, it is not a generalized linear model as it stands but, conditioning on one of the parameters allows it to be treated so. This parameter can then be estimated conditional on the remaining parameters, which yields a twostep iterative estimation procedure. See, for example, either Hilbe ( 2011) or Hilbe ( 2014) for a discussion of the steps involved. 
14.  This latter model, of course, corresponds to the Negbin II model of Cameron and Trivedi ( 1986), and so provides a somewhat stronger theoretical basis for that model, which may explain some of its popularity in the literature. 
15.  Specifically, Greene ( 2008) discusses the broader class of models obtained when k is allowed to take values other than 0 or 1 in ( 14). He dubs this broad model the NBP model, seemingly because his notation uses p rather than the k used by Cameron and Trivedi ( 1986) (and here). 
16.  Alternatively, using similar averaging arguments to those seen previously for the NBRM, if we average $\mathcal{P}\left(\theta \right)$ with respect to $\mathcal{G}\left(\theta ;\frac{\pi}{1\pi},n\right)$, where $\pi =1\gamma $ and $n=\lambda /\gamma $, then we obtain a more common form of the negative binomial pmf.
Note that the mean and variance of this distribution are given by ( Appendix A.2.2). In contrast with the developments of ( 11), there is nothing in this model that requires that both the parameters of the mixing gamma distribution vary with the index i. Nor need they be linked in any restrictive way. Specifically, if we were to follow the developments of Greene ( 2008) who equates the parameters of the mixing distribution, we find that
which constrains $0<\lambda <1$ and, as $\lambda =exp\left\{{x}_{i}^{\top}\beta \right\}$, this implies that ${x}_{i}^{\top}\beta <0$. As a general statement, this would appear to be a very odd restriction to want to impose. 
17.  
18.  Strictly, Katz ( 1965) adopted an approach more in keeping with a method of moments test. Specifically, he looked at the difference between estimators for the mean and variance, which should be equal under the null and then scaled this difference appropriately to obtain a distribution under the null. In any event, the statistic so obtained is the same as the one proposed by Lee ( 1986) that we consider here. 
19.  Lee ( 1986) proposed other tests than the one considered here, although he did not compare them numerically The results recorded in Miller ( 1998) suggests that those involving third order moments may have better power properties. For now we are primarily concerned with proof of concept and do not explore these other tests in light of the simplicity of ( 17). 
20.  We also considered an alternative test based on the tstatistic in the regression of ${z}_{i}$ on a constant but there was little difference in performance. These tests correspond to the Negbin I and II cases above. 
21.  
22.  In their extensions to this class of distributions, Panjer ( 1981); Sundt and Jewell ( 1981) and Willmot ( 1988) adopt a slightly different parameterization, specifically ${p}_{y}(a+b/y){p}_{y1}=0$, $y\in \{2,3,4,\dots \}$. Equivalence with ( A1) is seemingly established on setting $a=\gamma $ and $b=\lambda \gamma $, although there are differences in the support of the resulting variables. In particular, $Y=0$ is specifically excluded from this definition and hence many of the probability distributions claimed to satisfy the recursion in this form are not completely defined by it. 
23.  
24.  In essence, this is the same as adopting the convention that any negative probabilities are set to zero. It might be argued that this is at odds with Katz’s original assumptions and should be excluded. Our justification for the inclusion in our analysis of these distributions where $\lambda /\gamma $ is noninteger, is that Katz himself included them.
The class of distributions so defined includes the Poisson distributions, the twoparameter binomial (Bernoulli) distributions, and the twoparameter negative binomial (Pascal) distributions. Aside from these, the class contains only the mild generalizations obtained for the latter two of these types by permitting the parameter n (number of “trials” in direct sampling) and the parameter r (number of failures in inverse sampling) to take any positive real values. ( Katz 1965, p. 175).

25.  A useful collection of results on Pochhammer symbols can be found in Slater ( 1966, Appendix I). 
26.  We note that Katz ( 1965) was perfectly well aware of the possibility of noninteger values of $\lambda /\gamma $, see the quote in Footnote 24. 
27.  The condition $\lambda /\gamma $ integer obviously requires $\gamma \ne 0$. 
28.  