2.1. Discrete-Time, Continuous-Time Branching Processes, and Two-Type Branching Processes
Branching processes are commonly used to model an evolving population of microbial cells [
2]. In general, cell reproduction can be described by the following Markov chain. Denote the population size of the
nth cell generation (synchronized or not) by
with
, and the number of offspring of the
ith cell of the
th generation by
, the Markov chain
satisfies such a branching (or cell proliferation) rule
where
are non-negative, integer-valued, independent and identically distributed (i.i.d.) random variables following some discrete distribution (called the offspring distribution). Depending on whether the cell lifespan is fixed or random, branching processes can be categorized into discrete-time and continuous-time variants. For the discrete-time branching process (known as the Galton–Watson process or GWP [
3]), the cell lifespan is assumed to be a constant, hence the cell birth events in each generation are synchronized. Such an assumption is relaxed in the continuous-time branching process by allowing the cell lifetime to vary as a continuous random variable. The branching process with cell lifespan following an arbitrary continuous distribution is called the age-dependent branching process or the Bellman-Harris process (BHP) [
4,
5]. In particular, when the cell lifetime distribution is i.i.d. exponential, the resulting process is called the Markov branching process (MBP) [
6], otherwise, the corresponding continuous-time branching process is non-Markovian. It is obvious that, because of the branching rule, branching processes serve as appropriate mathematical models for cell population dynamics.
Since the Luria–Delbrück experiment is about random mutagenesis in cell populations, distinguishable cells with different probabilistic behavior should be allowed in the branching process model. For a typical fluctuation analysis involving two types of cells, namely the wild-type (non-mutant) and mutant cells, the population sizes of these two types of cells changing over time can be modeled by a two-type branching process [
7]. The specific interest is in the distribution of mutant cell numbers at a given time, based on which the cell mutation probability (or mutation rate per cell division) can be inferred. Without loss of generality, let us consider a “General Two-Type Branching Process”, hereinafter abbreviated as GTBP, which satisfies the following two fundamental rules: (i) Each cell lives a certain time (fixed or random) and then splits into a random number of offspring, independent of other cells. In particular, we may allow the wild-type and mutant cells to have different lifetime distributions and different offspring distributions. The parameters of these two distributions determine the growth rates of the two types of cells; (ii) Upon cell division, each cell mutates with a certain constant probability, independent of the division times. In a general setting, we allow backward mutations and assume that wild-type and mutant cells have mutation probabilities
and
, respectively, where
. Note that, this GTBP should not be confused with the general branching process (also called the Crump-Mode-Jagers, or CMJ, process), which allows multiple birth events from each cell according to a point process [
8,
9]. We assume that the cell population starts from
wild-type and
mutant cells at
, and denote the time of plating (i.e., the time for cell counting) by
and correspondingly the number of wild-type and mutant cells at
by
and
, respectively.
In contrast to the GTBP, we also define a “Simplified Two-Type Branching Process” (STBP) which is a special two-type MBP initiated by wild-type cell(s), with i.i.d. exponential lifetime for wild-type and mutant cells (i.e., non-differential growth) and binary-fission (i.e., Yule process), and without cell deaths or backward mutations. This model will be used throughout our simulation studies as described in
Section 2.3. We note that the STBP is similar to Kendall’s two-type branching process (KTBP) [
10], which is often known as the stochastic Luria–Delbrück model, with slight differences. The KTBP allows cell deaths and assumes that, upon division, each wild-type cell will either die, give birth to two wild-type offspring, or turn into one wild-type + one mutant cell, with certain rates; each mutant cell, on the other hand, will either die or divide into two mutant offspring with a certain rates [
11]. However, in the STBP formulation, we assume all cells grow according to binary-fission with i.i.d. exponentially distributed lifetime, mutant cells always divide into two mutant offspring, and wild-type cells produce mutant offspring according to either pre- or post-division mutation. That is, for pre-division mutation, each wild-type cell will mutate with probability (
, say) right before its division, but for post-division mutation, each wild-type cell will first divide into two wild-type offspring, then these two offspring will mutate independently with probability (
) right after the division. In other words, from the wild-type cell perspective, the offspring distribution probability generating function (PGF) is
for pre-division mutation, and
for post-division mutation. A schematic plot is shown in
Figure 1 to illustrate cell mutations in the KTBP and STBP models.
2.2. Algorithm for Simulating Population Dynamics and Mutations Based on a GTBP
In the present study, we consider the GTBP defined above. Clearly, such a model is flexible enough to cover various branching processes, e.g., the GWP, the MBP, and the BHP, with mutations taken into account. Algorithm 1 shows the simulation procedure of SimuBP based on such a GTBP. As described in the algorithm, there are four input arguments passed to the R function SimuBP:
among which the first one “bran” (structured as an R list object) determines the branching rule of cell proliferation. In this list object, the “bran
$span” component takes a character string, e.g., “fixed”, “exp”, “unif”, or “gam”, to specify the cell lifetime distribution (allowed to be different for wild-type and mutant cells). The “bran
$para” component is a vector or matrix which provides the lifetime distribution parameters in a pair, for example, if “bran
$span= ‘exp’ ”, then “bran
$para= ‘c(1, 2)’ ” means the exponential rate parameter for wild-type cells is 1 and for mutant cells is 2. The third and last component “bran
$offd” is a vector
specifying the offspring distribution, so “bran
$offd= ‘c(0,0,2)’ ” means binary-fission (if necessary, the wild-type and mutant cells can have different offspring distributions by changing “bran
$offd” to a matrix with two rows). The second input of the SimuBP function, “mupr=c(
)”, is a vector specifying the forward and backward mutation probabilities. The third input vector, “n0=c(
)”, specifies the initial number of wild-type and mutant cells. The last input “tp” is a scalar for the time of plating. Actually, both the time of plating and the population size at the time of plating can be used as input, however, considering the stochastic growth assumption of the GTBP, the former should be more appropriate for this simulator. For better illustration, these input arguments are shown in a schematic plot in
Figure 2. The output of SimuBP is simply a vector
where
is the number of mutant cells at
and
is the total number of viable cells at
.
Algorithm 1 The SimuBP algorithm for simulating cell population with mutations based on a GTBP |
Input: branching rule parameters including cell lifetime and offspring distributions (wild-type and mutant cells can have different parameters), mutation probability for forward and backward mutations, initial cell number , time of plating Output: total number of viable cells and number of mutant cells at Step 1. Initialize the number of wild-type and mutant cells at by setting . Step 2. Generate two vectors from the specified lifetime distribution. and are of length and , denoting the lifetime of wild-type and mutant cell(s) in the first (or current) generation. Generate two binary vectors from and . and are of length and , indicating whether mutation occurs for the wild-type and mutant cell(s) in the first (or current) generation. Based on and , calculate the accumulated lifetimes for wild-type and mutant cells. Step 3. Count wild-type cells with and denote this number by , these wild-type cells will continue to divide. Count wild-type cells with and denote this number by , update . Similarly, count mutant cells with and denote this number by , count mutant cells with and denote this number by , update . Let . while
do (a) Based on the offspring distribution(s), generate the numbers of offspring for the wild-type and mutant cells in current generation. (b) Repeat Steps 2∼3 by updating and with the numbers of offspring in (a). As cell division/mutation continues along generations, and decrease and the sum eventually reaches 0 to quit the loop. end while
|
2.3. Simulation Studies for Validation, Comparison, and Demonstration
We perform simulation studies based on an STBP to evaluate the performance of SimuBP, including three components S1∼S3 with the following specific aims:
- S1:
To check goodness-of-fit (GoF) of the STBP model to the data generated by SimuBP.
- S2:
To compare the data generated by SimuBP with those by two alternative simulators.
- S3:
To demonstrate mutation rate estimation based on the data generated by SimuBP.
Simulation S1 focused on validating the simulated data by SimuBP based on the STBP model. Suppose that, in the STBP the exponential rate of the cell lifetime distribution is
a, and the mutation probability of the wild-type cell is
. Two different cases, S1a and S1b, are considered depending on the initial number of wild-type cells:
and
. Denote the random variable of the total number of viable cells at the time of plating
by
, and the random variable of the number of wild-type cells at
by
. For S1a:
, the distributions of
and
can be obtained explicitly by using the property of binary-fission MBP [
12] (for convenience, a brief derivation is provided in
Appendix A.1):
and
When
is small as in typical fluctuation experiments, Formula (
2) can be approximated by
Consequently, for S1b:
,
and
We then use SimuBP with properly specified input arguments to generate data
based on the STBP, and check the GoF of these data to the above theoretical distributions. Note that, since forward simulation is generally not efficient, to avoid slow computation, SimuBP does not simply apply superposition (via looping) of the
and
counts initiated by a single cell, but rather generates
and
samples directly from non-unit
(and
as well in a generalized setting).
It is worth noting that, this STBP is different from the traditional Luria–Delbrück or Lea–Coulson model because it assumes stochastic growth for both wild-type and mutant cells. To illustrate this point, we perform an additional simulation study S1c, where the and counts are generated from SimuBP according to the STBP used above. The distribution of the number of mutants is then calculated and compared with a corresponding LD distribution to check the GoF.
Simulation S2 is conducted to compare SimuBP with two other simulation algorithms. Both Algorithms 2 and 3 simulate counts of
and
based on the STBP model. Algorithm 2 comprises four steps: First, obtain the occurring time of each cell division event prior to plating. This is done by using the distribution of the interarrival times of binary-fission MBP (see Proposition 1 in [
13]). Second, count the population size resulting from each initial cell and sum up across the
initial cells to obtain
. Third, determine among all cell division events the ones corresponding to mutation events, and consequently calculate for each mutation event its excess time until plating. Last, generate the resulting number of mutant cells from each mutation and sum up across the mutation events to obtain
. We denote the simulation study comparing Algorithms 1 and 2 by S2a.
Algorithm 2 Alternative simulator based on an STBP |
Input: exponential rates for wild-type and mutant cell life times, initial number of cells (wild-type) , mutation probability , time of plating Output: total number of viable cells and number of mutant cells at Step 1. For each initial wild-type cell, calculate the occurring times of the successive division events along its genealogy until by generating and summing up the exponentially distributed interarrival times with rate . Denote the occurring times starting from the ith initial cell by . Step 2. Count the number of elements in by . Because of the binary-fission property, the population size at , initiated by the ith cell is , hence . Step 3. Determine whether each cell division incurs mutation by generating random numbers from . Denote the number of mutations by m which is two times (This number may vary depending on the assumption of pre- or post-division mutations.) the sum of the Bernoulli random numbers. Denote the occurring time of the ith mutation event by , so its excess time until plating is . Step 4. For each mutation, generate its resulting mutant cell count at from shifted geometric distribution , and finally sum up across all m mutant cell counts to get the total number of mutant cells at .
|
In the second part of Simulation S2, denoted by S2b, we compare SimuBP with another simulator (Algorithm 3) adapted from the software SALVADOR [
14]. Algorithm 3 differs from SALVADOR mainly in that it generates the number of wild-type cells
at the time of plating from geometric growth rather than treating
as input, and replaces the Poisson distributed number of mutations based on deterministic growth by the actual number of mutations based on stochastic growth. These adaptions make it easier to compare Algorithm 3 with Algorithms 1 or 2. It can be seen that Algorithms 2 and 3 are closely related and both rely on the exponential lifetime assumption so that once the number of mutations and the time from each mutation to plating are determined, the number of mutant cells
at the time of plating can be obtained by generating geometrically distributed (with shift) random numbers.
Algorithm 3 Alternative simulator adapted from SALVADOR [14] |
Input: exponential rates for wild-type and mutant cell life times, initial number of cells (wild-type) , mutation probability , time of plating Output: total number of viable cells and number of mutant cells at Step 1. Generate the population size at by summing up random numbers, each drawn from . Step 2. Calculate the total number of mutations by , and generate the occurring time of the ith mutation event, , from truncated, flipped exponential distribution with range , i.e., from CDF . This is done by simulating as where the random number . Step 3. For each mutation, generate its resulting mutant cell count at from shifted geometric distribution , and finally sum up across all m mutant cell counts to get the total number of mutant cells at .
|
It should be emphasized that, SimuBP is flexible to generate more general fluctuation experimental data than most of the other simulators including Algorithms 2 and 3, for instance, by allowing
- (1)
the cell lifetime to follow an arbitrary continuous distribution, or be a constant,
- (2)
the offspring distribution to be any discrete distribution, not just binary-fission,
- (3)
cell deaths and backward mutations,
- (4)
the initial cell population to contain both wild-type and mutant cells.
Moreover, SimuBP can be further extended to simulate other complex mutation processes governed by non-constant (e.g., piece-wise constant or even time-varying) mutation rate, as seen in the second example of the following demonstrations.
Lastly, we demonstrate the application of SimuBP through Simulation S3 of estimating mutation rates in a two-type MBP via two examples, S3a and S3b. In Simulation S3a, we first generate data from SimuBP based on the STBP model and then perform point estimation for the mutation probability by using the MOM/MLE estimator proposed in [
12]. Example S3b considers the case of two-stage mutations, that is, during cell proliferation, mutations occur at a constant rate in stage 1 and, when entering stage 2 switch to another constant rate. Such data may be observed in fluctuation experiments comprising abrupt changes in external conditions. A typical example can be found in the protocol of mutagenesis experiment on
E. coli under sub-inhibitory antibiotic stress [
15], which introduces a cell recovery step prior to plating. The mutation rate in this two-stage process is a piece-wise constant function, which can be easily incorporated by SimuBP, but not by any other simulators. We then estimate the three unknown parameters of the piece-wise constant mutation rate function by using an estimator proposed in [
16] based on approximate Bayesian computation. The estimation results of the three parameters are shown by a heatmap of the joint posterior samples of Markov chain Monte Carlo (MCMC).