1. Introduction
The Luria-Delbrück fluctuation assay is widely used to estimate mutation rates of micro organisms such as bacterial cells. In very broad outline, several test tubes containing a liquid nutrient medium are seeded with the same number of normal-type cells. These cells multiply by binary fission attaining the number by time t. At this time the contents of the test tube are ‘plated’ onto a solid substrate which is (almost) immediately lethal for the normal cells. Some cells may have mutated during the growing phase into a resistant type. Under ideal conditions they will form visible colonies on the lethal substrate. Counting these colony numbers provides data which is used to determine the rate of mutation intrinsic to the organism of interest.
Exactly how the data are used for this determination depends on the mathematical model chosen to describe the dynamics of the situation. Various choices are available and we refer to [
1,
2] for reviews and references. The Lea and Coulson [
3] model and its subsequent tweaks is the most widely used of those available. In its simplest form it assumes the following occurs within each test tube.
- (i)
Normal cell numbers increase exponentially fast: .
- (ii)
Mutation occurs randomly at a rate proportional to . Specifically, there is a mutation rate r (per unit time per bacterial cell) such that a mutation event occurs in the interval with probability , i.e., a normal cell converts to a resistant type. There is no mutation with probability .
- (iii)
Mutation events create mutant clones which grows independently of each other according to a linear birth process with split rate , i.e., a binary splitting or Yule process. The relative growth rates of normal to mutant cells is denoted by .
These assumptions give rise to probability distributions for the total number
of mutants at time
t. The model implies that these distributions are infinitely divisible (abbreviated infdiv), i.e., compound Poisson distributions [
4]. Our aim in this paper is to investigate the presence of deeper infdiv properties of mutant number distributions. More specifically, are they generalised negative-binomial convolutions (GNBC’s)? The answer is interesting in its own right, but a positive answer gives structural insight, in particular that such a distribution is unimodal and it provides criteria which determine if the associated probability mass function is non-increasing or it has a positive mode.
Definitions and basic properties of positive infdiv distributions are reviewed in
Section 2. Useful subclasses of infdiv distributions are characterised by analytical properties of the density
(defined for
) of the Lévy measure (c.f. (
1)). The subclass of self-decomposable (SD) distributions is defined by requiring that
is non-increasing. This class is significant because its members have a unimodal density function. A corresponding discrete version is defined, and they are unimodal too. In addition, precise criteria exist which separate the cases of a non-increasing mass function (i.e., a mode at the origin), or the smallest mode is positive. These notions are applied to models where mutant clones grow as deterministic integer-valued functions.
The Lea-Coulson model described above can be generalised to allow normal cell growth to be an arbitrary positive-valued function of time. The resulting mutant number distribution is a mixture of Poisson distributions where the mixing distribution is a continuous infdiv distribution with the special property that
is completely monotone. Distributions having this property comprise the so-called Bondesson (BO) class. This notion is introduced in
Section 3, along with its discrete version. Details are provided for the balanced (
) generalised Lea-Coulson model in which normal cell lines grow according to the logistic (Pearl-Reed) population model.
The generalised gamma convolution (GGC) class of infdiv distributions comprise the subset of BO distributions for which the product
is completely monotone. This implies the inclusion GGC⊂SD, and hence GGC’s are unimodal. Poisson mixtures in which the mixing distribution is a GGC comprise the class of generalised negative-binomial convolutions (GNBC’s), and they too are unimodal. Relevant definitions and properties are introduced in
Section 4 where it is shown that a mutant number distribution arising from a generalised Lea-Coulson model in which normal cell growth is non-decreasing is a GNBC. This of course applies to the standard Lea-Coulson model as described above, and details are presented in
Section 4 together with precise criteria concerning the modal behaviour of the mutant number distribution; see Theorem 5(a). The section ends with a discussion of shapes of mutant number distributions selected by different estimation methodologies applied to experimental data and also the preservation of the GNBC property when plating efficiency is an issue.
It is often observed that mutations occur during the time of division of a normal cell. This contingency is addressed by branching process descriptions of the Luria-Delbrück set-up. Some details are provided in
Section 5 for the two most common models, those due to Haldane and Bartlett. Mutant number distributions for the Haldane model are not infdiv, whereas they are infdiv for the Bartlett model. However, rather less can be ascertained about fine infdiv properties for this model.
Finally, in
Section 6 we determine infdiv and modal properties of mutant number distributions arising from alternative models discussed by Kepler and Oprea [
5], Angerer [
6] and Stewart et al. [
7].
Some notation may have different definitions in different sections, but no confusion should arise.
2. Infdiv Distributions and Deterministic Mutant Growth
In this section, we shall review necessary basic ideas of infinite divisibility and self-decomposability and explore their (limited) applicability to the Lea-Coulson and Armitage models in which mutant numbers are assumed to increase deterministically.
Let
X be a non-negative random variable with distribution function (DF)
and Laplace-Stieltjes transform (LST)
If has a probability density function (pdf) , then . Denote the left-extremity of F by and observe that . Thus, if and if . This quantity can be computed from the LST according to .
Each of the quantities X, F and are called infdiv if, for each , the function is the LST of a probability distribution. This implies that, for any positive integer n, X can be expressed as a sum of random variables, , where the summands are independent and they have the distribution determined by . This encapsulates the idea of infinite divisibility. It is the case that the sum of independent infdiv random variables is itself infdiv.
An infdiv LST has a special canonical form
where
is called the
Laplace exponent and
is a measure, called the
Lévy measure, which satisfies the conditions
This means that
assigns a zero mass to the origin, it may assign infinite mass to any small interval
but it integrates
x at the origin, and it assigns a finite mass to infinite intervals
; here
is an arbitrary positive number. Functions having the form (
1) are called
Bernstein functions—see [
8], the standard reference. Differentiation of
shows that
Many common distributions are infdiv: gamma, Pareto and log-normal, to mention a few. For us the most important is the gamma family. We say that the random variable
has the standard gamma distribution with shape parameter
if its pdf is
if
and
if
. Here
denotes the gamma function (due to Euler); see [
9]. The gamma pdf is decreasing in
if
and it has a single positive mode at
if
. The corresponding LST is
, equivalently,
. We stress that infdiv laws can be multi-modal.
Remark 1. In many instances but we will need the additional generality for subsequent key definitions.
Suppose that
. Then
is a distribution function and the Laplace exponent (
1) can be written as
with the interpretation that
where the
are independent with DF
G and
is independent of the summands and it has the Poisson distribution with (rate) parameter
(and denoted by Poisson
). Thus,
X is represented as a (Poisson) random sum of independent jumps
and it is said to have a
compound Poisson distribution. Conversely, any positive infdiv distribution can be realised as the limit of a sequence of compound Poisson distributions.
An important sub-class of infdiv distributions is the class of
self-decomposable (SD) distributions. This notion can be given three equivalent definitions but we concern ourselves with the two which fit with our theme. The definition which explains the terminology is that
X has a SD distribution if it has the autoregressive representation that, for any constant
, there is a random variable
independent of
X such that
This says that if X is scaled down to , then the distribution of X can be recovered by adding an independent ‘error’ . Thus, the right-hand side represents the ‘self-decomposition’ of X. This definition can be expressed in terms of the LST of X as the assertion that X has a SD distribution if, for each , the quotient is completely monotone, and hence is the LST of a random variable, say.
It can be proved that a SD distribution is absolutely continuous
and infdiv. (In addition, the ‘error’ term
is infdiv.) The Lévy measure takes a special form which characterises SD distributions and which sometimes is adopted as the definition of this concept. We shall do likewise with the following formal definition refining (
1).
Definition 1. An infdiv distribution is SD if its Lévy measure λ has a density,where is non-increasing in . The regularity properties of λ then require that It follows that .
Example 1. The integral representation(just differentiate each side) implies that the gamma distribution is SD with . The following fact is important.
Fact 1. (a) Sums of independent SD random variables are SD.
(b) If F is the DF of a SD distribution, then it has a pdf f which solves the integral equation This pdf is unimodal, and if , then it is non-increasing with a mode at zero. If , then f is bounded. In addition, with , there is a mode in the interval .
See [
10] (pp. 408, 409) for the modality assertions, and more.
Remark 2. The integral Equation (6) has a wider applicability than is indicated by Fact 1. Specifically, if is a pdf for which there exists a function such that (6) holds, then f is infdiv iff . See [11] (p. 95) for an even more general account. Since members of the class of SD distributions have an absolutely continuous DF, we may wonder about discrete analogues of this concept. Suppose that
X is infdiv and it can take only non-negative integer values, i.e., it is discrete infdiv. Then it necessarily has a compound Poisson distribution with positive integer jumps.
. Denoting the PGF of the jump distribution by
, the general form (
2) becomes
where the notation on the left-hand side anticipates the application of these concepts to mutant number distributions. Here we understand that
, i.e., there are no zero-sized jumps.
Writing the jump PGF as
, then setting
in (
7) and comparing the result with (
1) (with
) makes it clear that the Lévy measure inherent in (
7) assigns mass
to integers
. Hence the total mass of the Lévy measure is
.
It is often convenient to express the PGF
M in the form
noting then that logarithmic differentiation of (
7)/(
8) yields
with
We thus obtain the discrete analogue of (
6),
Remark 3. The sequence is called the canonical sequence, or r-sequence, of the infdiv distribution . In fact, for any
discrete distribution there is a sequence such that (10) holds. An essential fact here is a theorem of Katti [12] asserting that is infdiv iff its r-sequence is non-negative. See [11] (p. 36). This result has subsequently been ‘re-discoverd’, e.g., [13] and [14] (p. 174). Many specific discrete distributions discussed in this paper arise as Poisson mixtures where the mixing distribution is infdiv, i.e.,
where
X is infdiv with Laplace exponent (
1). Hence
Thus the shift term
in (
1) induces a Poisson
component in the discrete mixture. Manipulation of the integral will show that
has the compound Poisson form (
7) with
A result of Holgate [
15] asserts that if the mixing distribution is unimodal (infdiv, or not), then the Poisson mixture is unimodal.
The next definition is suggested by Definition 1.
Definition 2. The discrete compound distribution is called discrete self-decomposable (DSD) if its r-sequence is non-increasing.
Thus, the Poisson
distribution is DSD because
and
if
and the general mixture (
12) is SDS if
.
The auto-regressive characterisation (
4) of (continuous) SD distributions has the following analogue. The characterisation (
4) of SD distributions involves multiplying a random variable by the constant
c to give a product smaller than
X. If
X is discrete, then this cannot be done in a way which gives an integer-valued product. Binomial thinning is an analogue which addresses this issue: Define a ‘discrete product’ as follows. Let
and
where the summands are independent with the Bernoulli
distribution and they are independent of
X. Thus,
, and the PGF of the product is
This product concept is due to the authors of [
11]; see p. 495 for the original reference.
Definition 3. The discrete random variable X has a DSD distribution if, for each , there is a discrete random variable such thatwhere the summands on the right-hand side are independent. Equivalently, the quantity is a PGF. Fact 2. A DSD distribution is unimodal. Its mass function is non-increasing iff .
Remark 4. Fact 2 imparts useful qualitative information about the general shape of the mass function of a DSD distribution. If , then for all ; the mass function is non-increasing. If , then the modal value is positive and it may not be unique. See Discussion 1.
We now consider two models in which normal cells and mutation occur as in §1 and in which mutant clones grow deterministically with sizes having integer values. The first such model was introduced by Lea and Coulson [
3] who derived some approximate results for it. Armitage [
16] gave it a more careful consideration. More detail is provided by Crump and Hoel [
17], who identify it as their
model. The survey [
1] names it the discretised Luria-Delbrück formulation and the treatment there probably is the most detailed.
Zheng’s term captures the central conception that at time
t after its formation, the size of a mutant clone is
where
denotes the ‘integer part of’. He shows that the PGF of
is given by
where
Theorem 1. The mutant number distribution is DSD, hence unimodal, if
- (a)
(equivalently, ), in which case its mass function is non-increasing iff ; i.e.,or if - (b)
and , in which case its mass function is non-increasing iff , i.e.,
Proof. It follows from the definition of K that , where is the fractional part of . Hence .
Substituting into (
13) and with reference to (
8), a differentiation yields the evaluations
If
, then the sum term in (
13) vanishes and
has a Poisson distribution with parameter
m and Assertion (a) is known.
Suppose that
. The general form of the
r-sequence is
, where
which clearly is decreasing in
if
.
If
, then this representation of
is not informative because now the first factor is increasing. Instead, computation of
and letting
will show that the sign of
coincides with that of
Clearly and . Hence in a small interval . Both of are concave-increasing and hence they can achieve equality in for at most one value of u.
Numerical calculation shows that
if
specified in the assertion, and that
if
. It follows that
is decreasing in
iff
. Consequently,
if
. In addition
Hence the r-sequence is non-increasing if , and Assertion (b) follows from Fact 2. □
The case covers the biologically more likely situation in which mutant clones grow no more quickly than normal clones. Theorem 1 fails if γ is sufficiently close to zero. Numerical calculation shows that there is a critical value such that (resp. >) if (resp. >). In other words, the modal value of the r-sequence jumps from zero to unity at . There is a similar jump from 1 to 2 at a critical value . These outcomes suggest the existence of a sequence of critical values as at which the modal value of the r-sequence jumps from i to . In addition, it suggests that Assertion (b) is valid if .
The second model we consider derives its deterministic growth character from assuming that mutant cells have a fixed lifetime of duration L at the end of which they divide. Thus, a clone has size during the interval since its inception. In order that mutant clones achieve splitting rate , we choose L such that , i.e., .
This model with
was introduced in [
17] where it is designated as the
model. The expression (11) in this reference for the mutant number PGF is valid for
and, with our notation, it is
where
m and
are the above time-dependent parameters and now
. We have the following result.
Theorem 2. The mutant number distribution specified by (15) is DSD, and hence unimodal if - (a)
, in which case the mass function is non-increasing iff , i.e.,or if - (b)
and , in which case its mass function is non-increasing iff , i.e., - (c)
The mutant number distribution is not SD otherwise.
Proof. If , then and no mutant has reproduced. Thus, equals the number of mutations during and hence it has a Poisson distribution. Assertion (a) follows.
If
, then
and it follows from (
15) that
i.e.,
,
, and
if
.
Hence , and Assertion (b) follows.
If , then , and hence is not SD. □
3. Bondesson Classes and the Generalised Lea-Coulson Model
In this section, we introduce the first of two special classes of infdiv distributions. The history of these notions is that the Swedish actuary/mathematician Olaf Thorin introduced in 1977/78 distributions now called Generalised Gamma Convolutions (GGC’s) with the specific purpose of proving that Pareto and lognormal distributions are infdiv. Subsequently many other distributions conjectured to be infdiv have been proved to be so by showing they are GGC’s. A nett benefit of this is that GGC’s are SD and hence unimodal. It follows then from Holgate’s theorem that Poisson mixtures of GGC’s are unimodal too. Lennart Bondesson introduced in 1981 the larger class of infdiv distributions which we review in this section. Detailed accounts of these topics are [
18] ([
11], Chapter VI) and [
8] (Chapters 6–9).
We begin as follows. Let
G be a DF on
and define a mixture of exponential distributions by
Clearly
f is a pdf and the corresponding LST is
Definition 4. A function F is the DF of a mixture of exponential distributions (written ) ifwhere and G is a DF on . Fact 3. - (a)
If X has the DF , then , where has an exponential distribution and is independent of .
- (b)
If , then it is infdiv.
- (c)
The DF iffwhere is a (measurable) function on satisfying
It follows from Example 1 that the Lévy density of the gamma distribution is and, in particular, that it is completely monotone. This motivates the following definition of the class BO of distributions named after Lennart Bondesson.
Definition 5. An infdiv DF F belongs to the Bondesson class (written ) if its Lévy measure has a completely monotone density,where B is a measure (the Bondesson measure) satisfying. Fact 4. - (a)
If , then its Laplace exponent has the formwhere B is a Bondesson measure. - (b)
The class is the smallest set of distributions containing and which is closed under convolution and weak limits.
There is a clear similarity of the cumulant functions (
16) and (
19) with
. This is not mere coincidence. If
, then
iff
and
, where
b satsfies (
17).
Definition 6. The discrete random variable X has a geometric mixture distribution if its PGF has the formwhere Π is a random variable satisfying . If
is independent of the random variable
which has a unit exponential distribution, then it follows from the mixture representation of the geometric distribution that
The product is infdiv, hence any geometric mixture is compound-Poisson.
We now introduce a discrete version of BO; the class BOP of Poisson mixtures with mixing distribution in BO. We will see that mutant number distributions arising from a generalisation of the Lea-Coulson model (below) and from the Bartlett model (
Section 5) live in BOP.
Definition 7. The discrete distribution belongs to BOP if its PGF , where is the Laplace exponent of .
The following fact arises fairly readily from (
19) and Definition 7.
Fact 6. - (a)
The discrete infdiv distribution iff is a Hausdorff moment sequence; specifically, - (b)
A distribution in is a mixture of geometric distributions if and .
Remark 5. The substitution will make clear that really is a Hausdorff moment sequence. For example, if B has a density , thenwhere, in general, denotes the measure which assigns unit mass to the real number ρ and zero mass to any interval not containing ρ. The representation asserted in Fact 6 often is more convenient for our purposes. Remark 6. In the most general situation, the fact that jump probabilities of a compound Poisson distribution comprise a non-increasing sequence implies little about the modal properties of . For example, if X has the Poisson and the Poisson distributions, respectively, and X and Y are independent, then has at least two modes, one at and the other at , if and . For example, if and .
By definition a generalised Lea-Coulson model admits any (measurable) deterministic growth function
of normal type cells. Replacing the exponential form
with
in the specification of §1 yields a compound Poisson distribution for mutant numbers
whose Lévy masses are
and
These outcomes are well-known and they follow from the order statistics property of Poisson processes. See [
19] for what seems the earliest and most general formulation. A later independent account specifically for the Luria-Delbrück context is in [
17], and the model is reviewed in [
1]. This generalised Lea-Coulson model can also be regarded as a branching process with inhomogeneous immigration. The branching component comprises the independently growing mutant clone birth processes and immigrants comprise the inhomogeneous Poisson process of mutations. See [
20] for a review of this topic.
We have the following general result.
Theorem 3. Let . The mutant number distribution of the generalised Lea-Coulson model is a distribution whose Bondesson measure has the densitywhere and is the Heaviside unit step function. Proof. Just make the substitution
in (
21) and refer to Fact 6 to obtain the desired moment representation,
. The resulting infinite integral does converge because it equals the integral (
21). Alternatively, observe that
and
, implying that the regularity conditions in Definition 5 always are satisfied. □
For computational purposes it is more convenient to shift the integration variable in Fact 6 to obtain
and the corresponding Lévy density
where
Remark 7. Substituting, again, in (22) and (24) gives the ‘explicit’ moment representation Hence the representing measure for any mutation number distribution derived from a generalised Lea-Coulson model has the time-dependent support .
This moment relation yields the fundamental relations
and hence
and
Example 2. Suppose that normal cells increase in number as a logistic growth model with carrying capacity . Thus, whose well known solution is Hence, for the balanced case, , some manipulation yieldswhere We assume that , implying that , and .
Substitution into (27) leads to the explicit formwhere is the exponential integral; see [9] (# 6.2.1). Define . The integrand of (26) resolves into partial fractions: We obtain expressions for the Poisson rates as follows.
Writingleads towhere, as usual, . It follows that The power series expansion of the logarithm term yields the formwhere Letting , recalling that and noting that recovers the balanced Lea-Coulson model which we will consider in more detail in the next section.
4. Thorin Classes and the Lea-Coulson Model
We now introduce the above mentioned GGC class of infdiv distributions which are pertinent to a significant subclass of generalised Lea-Coulson models. We motivate the general definition by observing that, given independent gamma random variables
(
) and constants
, it follows from (
5) that the Laplace exponent of the sum
can be expressed as
where
is a measure which assigns mass
to the point
. It follows from Fact 1(a) that
X is SD.
The SD class is closed under limits in distribution so, taking the informal limit
in (
32) yields a putative limiting Laplace exponent
This does specify a SD distribution for any measure
U on
satisfying
Definition 8. A distribution whose Laplace exponent has the form (33) where and U is a measure on subject to (34) is called a generalised gamma convolution (GGC). A function of the form (33) is called a Thorin Bernstein function. An equivalent specification is that the class of GGC’s is the smallest which contains scaled gamma distributions and is closed under convolution and weak limits. The representing measure
U in (
33) is often called the Thorin measure and we define the Thorin distribution function
.
Fact 7. - (a)
A GGC is a SD distribution for which the function k is completely monotone, .
- (b)
Any GGC has a unimodal pdf f.
- (c)
A GGC belongs to BO and its Bondesson measure is absolutely continuous with density .
We motivate a discrete version of
by observing that the best known case of a Poisson mixture (
12) is where
,
c a positive scaling constant, giving
where
. Hence this gamma-mixed Poisson distribution is the negative binomial distribution with parameters
p and
, denoted NB
. The case
of course is a geometric distribution whose mixing distribution is an exponential one. The following definition extends this idea.
Definition 9. A Poisson mixture distribution is a generalised negative-binomial convolution (GNBC) if the distribution of the mixing random variable X is a GGC as defined above.
A calculation using Fact 7 gives
Fact 8. - (a)
the PGF of a GNBC has the canonical formwhere V is a right-continuous function on such that , - (b)
The r-sequence (c.f. (9)) is a Hausdorff moment sequence, Conversely, if the r-sequence of a DID distribution has this moment representation, then it is a GNBC.
- (c)
The GNBC class is the smallest class of discrete distributions which contains negative-binomial distributions and is closed under convolution and weak limits.
- (d)
A GNBC is discrete unimodal and its mass function is non-decreasing iff .
Remark 8. Since the shift constant a in (33) induces a Poisson component in (35), the left-extremity of a GNBC always is zero. Assertion (d) follows from Fact 7(b) and Holgate’s theorem [15], and then Fact 2 observing that . The following fact gives a canonical representation for a mixture of geometric distributions and a condition that it be a GNBC; [
11] (pp. 381, 390).
Fact 9. - (a)
A function M defined on is the PGF of a geometric mixture distribution iff it has the formwhere w is a (measurable) function on such that - (b)
A GNBC PGF (35) is the PGF of a geometric-mixture distribution iff and its representing function V satisfies , in which case .
Referring to (
35), we will later need a general relation between the function
V and the Thorin measure
U of the mixing GGC distribution. The following result achieves this in terms of the Thorin distribution function
.
Theorem 4. The function is the right-continuous version of .
Proof. The integral in (
33) can be written as the Stieltjes integral
It follows from the first member of (
34) that for any
we can choose
such that
Hence .
Next, it follows from the second member of (
34) that there exists
such that if
, then
implying that
.
Observing that the integrand in (
37) is asymptotically proportional to
as
, and to
as
, it follows from an integration by parts that
In a similar manner, it follows from (
35) with
that the PGF of the corresponding GNBC is
The left-hand side equals and a computation shows that reduces to a Stieltjes integral as above with T as asserted. □
Recall the expression (
22) for the Lévy masses
pertaining to the generalised Lea-Coulson model. A very natural condition on the growth function
of normal cells implies that mutation number distributions are GNBC’s.
Theorem 5. Assume that the normal cell growth function is non-decreasing. Fix . Then:
- (a)
The distribution of is a GNBC and hence unimodal. Its mass function is non-increasing iff - (b)
The Lévy density of the mixing GGC is given bywhere the Thorin distribution function is - (c)
The canonical form of the PGF of iswhere - (d)
The mutant number distribution is a geometric mixture iff .
Proof. - (a)
Observe that
is non-decreasing in
y and that, since
, it follows from (
22) that
a Hausdorff moment. The GNBC assertion follows from Fact 8(b). The unimodality assertions follow from Fact 8(d).
- (b)
With
defined as above, observe that the representation (
23) yields
whence (
38).
- (c)
Observe that
in (
35) and the form of
follows from Theorem 4 expressed as
and (
39). Assertion (d) follows from Fact 9(b) and noting that
.
□
Remark 9. It follows from (20) and the hypothesis of Theorem 5 that is an increasing function of t. Clearly if t is sufficiently small in which case the mutant number mass function will be non-increasing. It attains a positive maximum value if eventually exceeds unity. The logistic differential Equation (
28) implies that if
, then its solution is strictly increasing. It follows that the corresponding mutant number distribution is a GNBC. However, except for the balanced case
it does not seem that the integrals (
26) and (
27) can be evaluated in any insightful way. In the balanced case we now know that the Lévy density (
30) is such that
is completely monotone. The following direct demonstration of this fact yields its Thorin function
.
Integration by parts shows that
, where
is completely monotone. Substitution into (
30) leads to
The substitution
exhibits
as the sum of two completely monotone functions:
Comparing this with (
38) we see that
Thus the Thorin measure has a discrete component - a point mass at and its support is independent of K.
In the remainder of this section we restrict consideration to the Lea-Coulson [
3] model described in §1 and give a self-contained treatment starting from (
26). Taking
we thus obtain
where
In the sequel we usually suppress the time dependence, thus regarding the distributions determined by (
40) as a parametric family determined by
where
and
.
Expressions equivalent to (
40) appear first in [
21]. Sometimes [
22] is coupled with this reference because, independently, a system of differential equations for the mass function of
is derived, generalising the system in [
3] for the case
, and deducing a numerical solution scheme. The integral in (
40) has no simple evaluation except perhaps for
.
In fact, if
, then evaluation gives the familiar outcome
This PGF appears for the first time in [
16] (p. 10) as a result of solving the linear first-order partial differential equation derived in [
3]. Zheng [
1] denotes the corresponding distribution by
where the
letter designation is chosen to honour the pioneering contribution of Salvador Luria and Max Delbrück.
Frequently in laboratory situations the product
is so large that
and it is argued that the form (
42) is approximated by
This
is a PGF as can be deduced from the explicit time-dependent
distributions by allowing
(implying
) and
such that
; a kind of Poisson approximation. Zheng [
1] (and others before him) name the distribution corresponding to (
43) after Lea and Coulson because they derive (
43) by using a clever manipulation to solve their partial differential equation. It is denoted by
and thus coincides with
. The solution (
42) satisfies
, reflecting the assumption (and laboratory situation) that
. The LC solution does not satisfy this initial condition, but it has an interesting form-invariant character which bears the interpretation that mutant numbers evolve as a non-homogeneous Poisson process.
In view of this historical progression, we will designate the full family of distributions corresponding to (
40) by
.
It is well known that the
distribution is qualitatively very different to
distributions when
. The moments of the former are infinite, reflecting the very slow decrease of its right-hand tail. If
, then all moments are finite and the right-hand tail decays exponentially fast [
4]. The following result shows that each
distributions is a GNBC and that the just-mentioned differences are reflected in the representing measures of the mixing GGC. Here, and below, recall that
denotes the Heaviside unit-step function, i.e., the DF of the degenerate distribution allocating unit mass to the origin. Just below, and later, we will encounter the second confluent hyper-geometric function,
where
and
b is real. Observe that this function is completely monotone; [
9] (Chapter 13).
Theorem 6. If and , then the distribution has the following properties.
- (a)
It is a GNBC, hence unimodal. Its mass function is non-increasing iff - (b)
The function V in the representation (35) is In particular, the distribution is a geometric mixture iff .
- (c)
The GGC mixing distribution has the Thorin distribution function - (d)
The Lévy measure of the GGC mixing distribution has a density which has the following explicit forms:
Remark 10. The above-mentioned difference between the cases and are manifested in the fact that the representing functions V and T are continuous with supports coinciding with their domains iff . Indeed, if , then decreases from θ at to at and it jumps to zero at . Note that (48) results by letting in (30). Proof. (a) Comparing (
6) with (
40), a differentiation gives
This exhibits the desired Hausdorff moment form with the measure
This implies the first assertion, and the second follows by evaluating and appealing to Fact 2. Observe that the measure V has a discrete component which vanishes when .
(b) Integrating (
49) and simplifying the result leads to
where
C is the constant of integration. The condition
implies that
, whence (
45).
(c) The evaluation (
46) comes directly from Theorem 1 and (
45).
(d) Recall that the Lévy density
exists and, with no parameter restriction,
The right-hand side integral is an ‘incomplete’ confluent hypergeometric type of integral. If
, then the first term vanishes and (
47) follows.
If
, but
, then the substitution
produces the evaluation
and (
48) follows after integrating by parts. □
Remark 11. Reverting to the time-dependent form of parameters, it follows from Theorem 4 that being a geometric mixture and the nature of modality are time-dependent properties, whereas, e.g., the SD property of the distributions is a time-independent property. See [10] for this dichotomy. Discussion 1. The family of distributions is most commonly used to fit empirical mutant number distributions. It follows from the criterion (44) that as increases from small to large values, the mutant number mass function transitions from decreasing to having a positive mode. If equality folds in (44), then zero and unity are modes. It usually is the case that the estimate of is so close to unity that it is chosen to equal unity. In this case the criterion (44) simplifies to In the case of equal fitness of normal and mutant cells, (the model), then the transition from a zero to positive mode can be seen in the first three columns of Table 2 in [3] where, if (denoted by m in this reference), then . The model is fitted to three sets of laboratory data in [22] where is estimated as 0.3783, 3.84 and 3.03, respectively. Figures 3–5 in [22] graph the mass functions corresponding to these values. Cases of differential fitness are illustrated in [1] (where is denoted by ). Figure 1 therein shows the mass function of the distribution with a modal value roughly 40. These numerical values are computed from those in the caption of Figure 1: and . In addition, the parameter values yield , justifying the choice . Figure 2 in [1] illustrates what can occur if is held constant and varies. This figure shows two graphs, the upper one for and the lower one for . Comparing these with Figure 1 in [1] suggests that increasing above unity yields more sharply peaked mass functions. The distribution has a finite mean iff , and a finite variance iff . Hence these example distributions have a finite mean and infinite variance. Finally, to see that real estimated mutant number distributions can exhibit a zero or a positive mode we recall estimates determined in [23] from several experimental data sets for the distribution. A main objective in [23] is to introduce parameter estimation based on the empirical PGF and compare its performance with maximum likelihood estimation (MLE). Table 1 in [23] presents 95% confidence intervals for (denoted there by ) and (denoted there by ). Assuming that point estimates are the mid-point values of the confidence intervals, Table 1 here exhibits these estimates and it indicates the shapes of the estimated mass functions. There now are several methods of estimating mutation model parameters and a question of interest is that if several methods are applied to a given set of data, will they be consistent as to the shape of the mutation number distributions they select? Published studies indicate that different methods can give quite different estimates, but they usually are, but not always, consistent in regard to the selected distribution shape. We mention two comparative studies for the model.
Five estimation methods are compared using four data sets in [28] (where m is used for ). Table 2 in [28] shows broad consistency in shape selection for Experiments A-C, with the first two indicating a zero mode and the third a positive mode. The -method was not applied/applicable to the Experiment D data. Two of the four estimated values resulted in a mode at zero, and the other two a positive mode. Table 5 in [26] compares seven estimation methods using seven sets of experimental data. Estimates of (m in [26]) are quite variable across estimation methods, but selected shapes are broadly consistent. In fact, after adjusting the Luria-Delbrück mean method by eliminating large jackpots, all methods were consistent in five of the seven data sets. In the two other cases all methods except the Drake median method gave estimated , and the Drake estimate was a little over two; 2.07 for Experiment 2 and 2.08 for Experiment 6. In these cases the modal value is unity; , and if , and , and if . Finally, a zero mode was found for five of the seven data sets. These investigations do provide confidence that, although different methodologies can show rather different parameter estimates, they in fact are broadly consistent with respect to shape selection.
We end this section with some remarks about
plating efficiency. This term refers to the possibility that, upon plating, a mutant cell fails to establish a colony. This aspect frequently is modelled by assuming that plated mutants independently establish colonies each with a probability
. In other words, successful establishment is modelled by binomial thinning – if
is the PGF of the number of plated mutants, then the PGF of the number of established colonies is
. A very convenient result asserts that binomial thinning preserves the GNBC property. Specifically, if
, then we obtain from (
35) that
where
In particular, if these measures have densities
and
, respectively, then
5. Branching Process Models
The normal population is depleted by one cell each time a mutation occurs. The Lea-Coulson model does not directly account for this. One argument is that in real situations so, this contingency can be neglected. Another response is to replace the parameter with , thus adjusting for a diminished average normal population growth rate.
Branching process models do take direct account of the normal population diminution due to mutation. A discrete-time model was propounded (no later than 1946) by J.B.S. Haldane. See Zheng [
29] for an account and references. Haldane’s model counts population sizes generation by generation. Cell numbers increase by binary division and hence the total size (normals plus mutants) of the
nth generation cannot exceed
. Consequently the distribution of
, the size of the
nth mutant generation, cannot be infdiv. There is a Poisson type of limit theorem [
30] resulting in a limiting compound Poisson distribution (and hence infdiv) and whose jump distribution has the PGF
, a gap series, and hence this limiting distribution is not DSD.
Instead, we shall consider the continuous-time version of Haldane’s model. This model is a two-type linear birth process apparently formulated by M.S. Bartlett around 1951/2. It is mentioned for the first time in [
16] (p. 37) with details appearing in the first edition of [
31] (p. 132) published in 1955. See [
32] for a detailed account and earlier references.
The balanced version of the model assumes that normal and mutant types divide into two cells during the interval with probability , independently of previous history. Mutants breed true, but a dividing normal cell has probability p of producing one mutant and one normal cell and probability of producing two normal cells.
The PGF of
is
where
(as above), but again we suppress the dependence on time
t in our notation.
Zheng [
32] (with more detail in [
33]) notes a Poisson type of limit in which
(i.e.,
) and
such that
resulting in the limiting PGF
The following result gathers infdiv properties of the Bartlett distributions, however, it is deficient in NOT concluding that they are GNBC’s. Referring to (
50), the term in square brackets can be written as
, where
and
say. We show below that
is a PGF.
It follows that in (
50), the integer
can be replaced with a positive-valued parameter,
say. Thus
i.e.,
is the PGF of a gamma mixture of discrete infdiv distributions. We shall denote members of the resulting Bartlett family of distributions by
.
The next result shows that a Bartlett distribution is a gamma mixture of GNBC’s.
Theorem 7. Let and be as defined in (53). The distribution whose PGF is is a GNBC whose mixing GGC has the Lévy density The corresponding Thorin distribution function is Proof. We begin by showing
is a PGF. Writing
and referring to (
53) we see that
and
Hence and if .
Next, observe that
where
is the beta function. We thus have the explicit representation
where
, by virtue of a reflection formula for gamma functions.
It thus follows from the usual integral representation for beta functions that
Hence h is a PGF, as asserted above.
Making the substitution
and comparing the outcome with (
20) we see that the Poisson intensity sequence
is a Hausdorff moment sequence and that the Bondesson measure has support
and density
where
This and Fact 7(c) imply (
55).
Referring to (
18), we have
where
The second equality above follows from the substitution
and the final form follows from evaluating the subtracted integral term in the penultimate line using the substitution
to obtain
We thus have obtain a final outcome
and it follows from its construction that
ℓ is completely monotone. Hence
is the PGF of a BOP distribution. Furthermore, this exhibits
as the difference of completely monotone functions and we need to find a different representation to be able to conclude that
is completely monotone.
The Kummer transformation
implies the identity
. Integration by parts of the right-hand side integral leads to
Substitution into (
56) yields (
54), as asserted. It is clear now that
is completely monotone, and hence that
is the PGF of a GNBC. □
Remark 12. An alternative, but not shorter, proof leading directly to (54) involves constructing the Bernstein representation of using the identity listed as Entry 2 in [8] (p. 304). Recalling (
50) with the integer
replaced by
and the definition
, then choosing
yields the representation for the Bartlett PGF,
It follows from Theorem 7 that this involves the composition of two Thorin Bernstein functions. However, the class of such functions is not closed under composition and hence we cannot conclude that a Bartlett distribution is a GNBC. On the other hand, the components of this composition are complete Bernstein functions and this class
is closed under composition. See [
8] (pp. 112 and 94), respectively. Hence we can conclude that Bartlett distributions of mutant numbers belong to BOP.
Similarly, the Zheng PGF (
51) is that of a gamma mixture of Lea-Coulson distributions. Hence a corresponding analogue of Theorem 7 in essence is Theorem 6(d).
6. Some Other Mutant Number Distributions
The total population size
for the above (balanced) Bartlett model comprises a linear birth process with splitting rate
. Thus, the embedded jump chain is the deterministic process which jumps by unity at each cell division. Angerer [
6] and Kepler and Oprea [
5] independently and almost simultaneously proposed a discrete-time model for mutant numbers
immediately following successive divisions at which
takes values
. Thus,
if
, and clearly
. Their precise specifications differ in some details but, as in
Section 5, a dividing normal cell produces one normal and one mutant with probability
p. Angerer mentions back mutation but does not pursue that issue, instead he allows for mutation rates to depend on
n and he provides a very careful and exact treatment of their models. Kepler and Oprea include the possibility of back mutation. Taking account of these differences, their fundamental difference equations relating the distributions of
and
, Equation (
1) in both references, are the same.
In a more detail, Kepler and Oprea [
5] assume a dividing mutant produces two mutants with probability
and one cell of each type with probability
. With no detail given, after they ‘pass to a continuum representation form’, they assert that the PGF
of
is given by
Let , although the biological context implies that . Note that taking yields the Poisson distribution with parameter .
So, assuming that
, the substitution
and then comparing the outcome with (
40) shows that
has the
distribution with
and hence Theorem 6 above applies.
Angerer [
6] proves several limit theorems for
as
and other constraints hold. For example, the limiting PGF displayed as (29) in [
6] shows that the limiting distribution is that of a sum
, where
has a negative binomial distribution and
M a LD distribution and they are independent. Hence the sum is a GNBC. Similarly the limit (32) in [
6] is the PGF of a similar sum with
replaced by a Poisson distributed random variable, again a GNBC.
More interesting is the PGF
displayed near the end of the proof of Theorem 5.2 in [
6]. The relation to the explicit form there is that
and
, where
,
and
are certain constants specified in [
6].
Theorem 8. (a) If and , then (58) specifies a distribution which belongs to , but is not a GNBC. (b) This distribution is DSD iff . In this case the mass function is non-increasing iff .
Proof. (a) We have
Expanding the logarithm term and collecting coefficients of
, we find that
and
Thus the sequence
is a Hausdorff moment sequence, implying membership of BOP.
Recalling that
, we have
Hence we obtain a moment representation
, where
This function increases in but it has a negative jump at . Hence it is not monotone, implying the second assertion of (a).
(b) The second equality of (
59) can be expanded as
Hence
iff
, a necessary condition for the SD property. In addition,
iff
The left-hand side of (
60) is bounded below by
Hence (
60) certainly holds if
. The right-hand side is increasing in
j and the case
requires that
. So, this condition is sufficient for the SD property. □
The final model we shall examine is based on the discretised Luria-Delbrück model as reformulated in [
7]. There are three model assumptions:
The probability of a mutation during is , where , but otherwise is arbitrary;
A mutation occurring at time t induces a growing clone of size at the time of plating/observation. Define ; and
Mutations are classifies as type
j if
. The number of type
j mutations in a single culture is denoted by
, a random variable having a Poisson
distribution where
and “
T is the time after which no observable mutations will occur”. Presumably, this could be the time of plating.
In relation to the second assumption, there is an enigmatic assertion that “depends on when the mutation occurs”. However, this is the absolute time t according to their direct specification. So, perhaps what is meant that t here means the current lifetime of the clone. We shall adopt this interpretation because it seems best aligned with the third assumption. Thus, is the number of type j mutations existing at time T.
Consequently, the number of mutants at time
T is
and, assuming that the
are independent, which is unstated but implicit in [
7], the PGF of the mutant number distribution is
where, as above,
and
. Thus, the computation of
reduces to a determination of
and
.
A Luria-Delbrück model with a time and state-dependent mutation rate is specified in [
7] (p. 181). Normal cell numbers grow according to
and mutant numbers as
. Hence a mutation at time
t results in a clone size
equal to
Denote a generic value of the right-hand side by
j. Hence
So, if
, then
where, in the integral, we regard
t as a function of
.
Thus, the problem reduces to deciding the form of
. A standing assumption is that
and
and, more specifically, that
where
and
are related by
and
,
P and
Q are positive constants. Here,
represents a constant mutation rate per cell per generation and
a rate per cell per time.
These specifications yield
and hence
Observe that the integral for
diverges for
. This is handled by computing the rate
required to achieve a specified value of
, although this tactic does represent a deviation from the model formulation in [
7]. The above log-term equals
, and hence partial summation yields
Note that an approximation has been adopted in [
7] whereby the zero-valued
are replaced by the algebraic values obtained from the integration.
It follows that a necessary condition for DSD is
, i.e.,
Theorem 9. The mutant number distribution for the above specification is DSD iff (62) holds, in which case the mass function is non-increasing iff . Proof. If
, then
The coefficient of
B equals
For any
, the integrand decreases as
j increases from
to
and the length of the interval of integration decreses too. Hence
if
. □
We know that the sequence of Poisson rates whose terms equal correspond to a GNBC. So a question is whether the sequence of rates ) together with an admissible value for similarly can be associated? We shall see below rhat the answer is No!
Referring to (
61), if
such that
, then the result is the differential equation for logistic growth. Hence (
61) itself represents a generalised form of logistic growth. More generally, (
61) is a particular case of the relation
where
is decreasing in
n. We choose the following specific form.
Thus
gives logistic growth, and if
, then
has a quadratic profile approximating the linear logistic profile. We compute
Evaluation of the integral follows from the substitution
and resolving the integrand into partial fraction form. Note that the cases
and
yield the sequence in [
7] and that our restriction
is required by the context because
is increasing if
.
Proceeding further, let
and define
Lemma 1. (a) If , then the sequence is a Hausdorff moment sequence:(b) If , then the sequence is a Hausdorff moment sequence:where(c) The Poisson rate (63) is and the r-sequence is given by . Proof1 (a) Begin with the following easily checked identity
The integrand term in brackets equals
Thus we obtain a double integral and the integral with respect to
y is
Hence we have the evaluation
Now replace
c with
to obtain Assertion (a).
It follows from
that
and in addition,
. The first equality in (
65) follows, and a log-differentiation yields the second equality. □
It follows from Lemma 1 that if
, then the distribution determined by (
63) is a GNBC and hence that it is unimodal. The mass function is non-increasing iff
, i.e.,
. If
, then
and the distribution is degenerate at infinity. Observe that, since
,
and hence Fact 5 shows that the mixing continuous distribution is not in
.
Comparing the first member of (
64) and (
20) with
shows that the Bondesson measure of the mixing GGC has the density
Writing
it follows that the integral expression for the Lévy density of the mixing GGC is