Are Adaptive Chemotherapy Schedules Robust? A Three-Strategy Stochastic Evolutionary Game Theory Model

Simple Summary We describe an evolutionary game theory mathematical model to investigate the robustness and accuracy of adaptive chemotherapy schedules in a stochastic environment in a tumor. The model assumes tumors are made up of a finite cell population of chemo-sensitive cells, and two populations of resistant cells, each resistant to one of two separate drugs that can be administered on different schedules. The goal of the adaptive schedules are to delay chemoresistance in the tumor by keeping the cell populations in competition with each other without any of the populations reaching fixation leading to treatment failure. Abstract We investigate the robustness of adaptive chemotherapy schedules over repeated cycles and a wide range of tumor sizes. Using a non-stationary stochastic three-component fitness-dependent Moran process model (to track frequencies), we quantify the variance of the response to treatment associated with multidrug adaptive schedules that are designed to mitigate chemotherapeutic resistance in an idealized (well-mixed) setting. The finite cell (N tumor cells) stochastic process consists of populations of chemosensitive cells, chemoresistant cells to drug 1, and chemoresistant cells to drug 2, and the drug interactions can be synergistic, additive, or antagonistic. Tumor growth rates in this model are proportional to the average fitness of the tumor as measured by the three populations of cancer cells compared to a background microenvironment average value. An adaptive chemoschedule is determined by using the N→∞ limit of the finite-cell process (i.e., the adjusted replicator equations) which is constructed by finding closed treatment response loops (which we call evolutionary cycles) in the three component phase-space. The schedules that give rise to these cycles are designed to manage chemoresistance by avoiding competitive release of the resistant cell populations. To address the question of how these cycles perform in practice over large patient populations with tumors across a range of sizes, we consider the variances associated with the approximate stochastic cycles for finite N, repeating the idealized adaptive schedule over multiple periods. For finite cell populations, the distributions remain approximately multi-Gaussian in the principal component coordinates through the first three cycles, with variances increasing exponentially with each cycle. As the number of cycles increases, the multi-Gaussian nature of the distribution breaks down due to the fact that one of the three sub-populations typically saturates the tumor (competitive release) resulting in treatment failure. This suggests that to design an effective and repeatable adaptive chemoschedule in practice will require a highly accurate tumor model and accurate measurements of the sub-population frequencies or the errors will quickly (exponentially) degrade its effectiveness, particularly when the drug interactions are synergistic. Possible ways to extend the efficacy of the stochastic cycles in light of the computational simulations are discussed.


Introduction
The design of adaptive chemotherapy schedules [1], motivated and aided by mathematical/computational models [2], is a rapidly developing field that has great potential for mitigating chemoresistance in tumors [3]. The advantage of an adaptive schedule over a more widely accepted pre-determined schedule, such as a maximum tolerated dose (MTD) schedule, or a low-dose metronomic (LDM) schedule [4,5], is that adaptive schedules are able to evolve and change along with the tumor [6]. However, as has been observed [7], the efficacy of sequential adaptive cycles depends crucially on the ability to monitor and track the balance of heterogeneous sub-populations of cells that comprise the tumor [8], or at least an accurate surrogate biomarker [9,10], which can be challenging. Additionally, the fact that any complex finite population of cells will evolve stochastically [11][12][13] makes it particularly important to assess the effectiveness of adaptive schedules under a range of diverse conditions, such as different patient populations and tumor sizes and over many therapy cycles. We present a mathematical model that addresses these issues.
Standard pre-scheduled chemotherapy dose-delivery schedules suffer from the common occurrence of chemoresistance of the tumor [14], leading to treatment failure [15] after multiple cycles. To overcome this, adaptive schedules are designed with the goal of managing resistance [1]. Adaptive therapies typically leverage tumor heterogeneity [8] by exploiting evolution via competition of the tumor sub-populations [16] to steer evolution and designing an advantageous chemoschedule [17][18][19] that delays chemoresistance by maintaining a sufficient fraction of the sensitive population in order to suppress the resistant population, a strategy used in the context of avoiding antibiotic resistance [20] as well.
Aside from schedule design (dose and timing), the use of two or more drugs is helpful in mitigating resistance, if combined strategically [2,21]. A key challenge associated with dose scheduling and the optimal design of multidrug combinations is that both the drug schedules and the multidrug mixture rely on an accurate dynamical assessment of the relative balance and mixture of the evolving cell types in the tumor, which is hard to obtain in clinical practice [7]. Quantitative assessments are, generally speaking, much easier to obtain in a mathematical model.
Even within the framework of a mathematical model, how exactly to quantify the effectiveness of the schedule and multidrug mixture when the schedule is adaptive (i.e., not fixed and repeatable) is not at all straightforward. We address this important issue of uncertainty quantification and robustness [22][23][24] of tumor response to adaptive therapy schedules and synergystic vs. antagonistic multidrug interactions [25,26] by using a stochastic finite-cell fitness-dependent Moran process evolutionary game theory tumor model with an adaptive schedule designed from the deterministic adjusted replicator dynamical system [27], which is the large cell limit (N → ∞) of the finite-cell stochastic process [28,29]. We describe the main features of our model as well as the connections between the finite cell stochastic model and infinite cell deterministic model in the next section.

Model Description
The model we use to investigate the potential efficacy of designing closed evolutionary cycles to avoid chemoresistance in a stochastic setting consists of the following basic elements: We implement a version of the adjusted replicator dynamics equations to track the frequency evolution of the three sub-populations of cancer cells that make up the tumor. Our model consists of a chemo-sensitive population (S), a population resistant to drug 1 (R 1 ), and a population resistant to drug 2 (R 2 ). The competition among the three populations is determined by a 3 × 3 payoff matrix A that builds in a fitness cost of resistance. In the absence of chemotherapy, the sensitive population is most fit and will outcompete the resistant populations, saturating the tumor.

2.
We use two time-dependent control functions C 1 (t) and C 2 (t) to model the chemotherapy dosing schedules. These chemotherapy dosing functions control three selection pressure parameters w 1 (t), w 2 (t), and w 3 (t) by altering the relative fitness values of the three sub-populations. A dose of chemotherapy (C i (t) > 0; t 0 ≤ t ≤ t 1 ) lowers the relative fitness of the targeted cell population by altering the selection pressures on the sub-populations to effectively favor ones that are not targeted. This mechanism allows us to 'design' favorable fitness landscapes indirectly by adaptively monitoring the sub-population frequencies and altering our dosing schedule in response. Our goals when we design chemotherapy schedules are to: (i) avoid fixation of the sensitive cell population; (ii) avoid chemoresistance (fixation of either of the resistant populations) by keeping the three populations of cells in competition without allowing any of them to saturate the tumor. We implement this by designing schedules that keep us confined to a closed 'evolutionary cycle' which keep the sub-populations in competition forever (ideally) for the deterministic (N = ∞) model.

3.
We then test the performance of these designed schedules on a finite cell (N < ∞) stochastic Moran process model to track the sub-population frequencies during these cycles, where the limit N → ∞ corresponds to our adjusted replicator model from which the cycle was designed.

4.
Since the Moran process model uses a fixed value of N (hence, it cannot be used directly to determine tumor growth where the total cancer cell population increases), we add a tumor growth equation with growth rate determined as a function of the average fitness of the three cancer cell sub-population frequencies comprising the tumor. When the average fitness of the cancer cell populations is above a fixed microenvironmental average, the tumor grows (exponentially), and when it is below, it shrinks (exponentially).
Details of each of these elements are described in the next section.

Three-Component Two-Drug Adjusted Replicator Dynamics Model
The deterministic model we use to determine the sub-population frequencies (x 1 , x 2 , x 3 ) ≡ (S, R 1 , R 2 ) and the dosing schedules (C 1 (t), C 2 (t)) to determine the adaptive therapy are the adjusted replicator equations: Here, f α denotes the fitness of sub-population α: while f denotes the average fitness of the entire population (tumor): We restrict 0 ≤ w α ≤ 1 and note that for w α = 0, fitness is constant ( f α = 1), while for w α = 1, fitness is determined by the payoff matrix A. The growth or decay of population α in Equation (1) is determined by the deviation of the fitness of population α from the average, normalized by the average fitness of the tumor.
The relative fitness values of the three sub-populations are controlled by the selection pressure parameters w α : where C(t) = (C 1 (t), C 2 (t)) is the chemotherapy delivery function, with C 1 controlling drug 1, C 2 controlling drug 2, and e is a parameter that determines whether the two drugs act synergistically (e > 0), antagonistically (e < 0), or additively (e = 0). We take w 0 = 0.1, which sets the timescale in our simulations. Notice that when drug 1 is applied (C 1 (t) > 0), it acts to reduce the selection pressure parameters w 1 , w 3 , but leaves w 2 (the parameter controlling the R 1 population) unchanged, so the R 1 population is resistant to drug 1. When drug 2 is applied (C 2 (t) > 0), it acts to reduce the selection pressure parameters w 1 , w 2 , leaving w 3 unchanged (the parameter controlling the R 2 population), so the R 2 population is resistant to drug 2. Thus, our model steers evolution by exploiting the effect that chemotherapy has on the selection pressure applied to different sub-populations. The total dose delivered over time-period t is denoted D ∈ R 2 : Then:˙ and: D T denotes the total dose delivered in fixed time T. The system (1) is a nonlinear non-constant coefficient dynamical system governing the frequency distributions.
To fix the evolutionary game being played by the population of cells, we consider the 3 × 3 payoff matrix A: In order to model tumor kinetics, we take our payoff matrix to be of Prisoner's Dilemma (PD) type [30]. The reasons why the PD matrix is useful as a cancer model are discussed in more detail in [19,30], but arise from the fact that this choice allows the sensitive cell population (the defectors) to reach fixation with an S-shaped growth curve in the absence of chemotherapy (C 1 (t), C 2 (t)) = (0, 0) and enforces a 'cost-of-resistance' on the two resistant sub-populations (the cooperators) [31]. The PD inequality bounds for the entries of the payoff matrix are: For definiteness, we choose the specific values: Our chemotherapy delivery functions are constrained so that 0 ≤ C 1 (t) ≤ 1, 0 ≤ C 2 (t) ≤ 1, 0 ≤ C 1 (t) + C 2 (t) ≤ 1, and from Equations (5)-(7) reduce the fitness of the relevant sub-populations by altering the relative selection pressures.
Since the Moran process model tracks frequencies while keeping N fixed, we need an additional tumor growth equation to complete our model. For this, we use the following equation to track the tumor volume V(t): Here, g denotes a positive constant we take as generally representing the fitness of the average microenvironment surrounding the tumor. Hence, the growth rate (or decay) of the tumor, ( f − g), is given by the deviation of the average tumor fitness from the average fitness of the local microenvironment, as represented by the gray region surrounding the tumor cells in Figure 1.

Figure 1.
A sequence of adaptive chemoschedules that lock the tumor in a closed evolutionary cycle is difficult to achieve for finite cell (shown for N = 12) populations since sub-population frequencies fluctuate stochastically and are difficult to measure with precision. Middle inset shows a discrete tri-linear plot of a stochastic realization in the (S, R 1 , R 2 ) plane, with n = 24 steps, starting at point A, using the chemoschedule designed from the deterministic (N → ∞) model.

Three-Component Two-Drug Discrete Stochastic Moran Process
In more detail, our finite cell model is a three component stochastic birth-death process [32,33], with frequency-dependent fitness governed by a payoff matrix, and population size N comprised of a fluctuating mixture of cells of type S (sensitive), cells of type R 1 (resistant to drug 1), and cells of type R 2 (resistant to drug 2), with S + R 1 + R 2 = N. At every step n in the process, a cell is randomly selected for reproduction with probability proportional to its fitness. The offspring produced replaces a randomly chosen cell in the population. The fitness of cells in the sub-populations S, R 1 , and R 2 are given respectively as: Here, i denotes the number of cells of type S, j denotes the number of cells of type R 1 , and N − i − j denotes the number of cells of type R 2 , with 0 ≤ i ≤ N; 0 ≤ j ≤ N, i + j ≤ N. The state of the system is given by (i, j) (visualized as a grid point on a triangular grid as in Figure 1) with probability of residing in that state after n steps given by p (n) ij .
As in the continuous model, the parameter w α ∈ [0, 1] (α = 1, 2, 3) controls the selection pressure and allows us to independently adjust the fitness landscape. At the two extremes, when w α = 0, the payoff matrix makes no contribution to fitness, so only random drift governs the dynamics. At the other extreme, when w α = 1, the payoff matrix makes a large contribution to the fitness, with selection pressure driving the dynamics. We define the average tumor fitness, f N , as the discrete analogue of (4): where: The transition probabilities at each step of the birth-death (Markov) process are written as: To implement chemotherapy in the discrete Markov process, we discretize time 0 ≤ t ≤ T in (8)- (10) so that at each time-step n in the process (with n = tN), we enforce the timedependent chemotherapy schedules (5)-(10) by adjusting the selection pressures where: This renders the Markov process non-stationary [34]. The state-vector P n ∈ R (N+2)(N+1)/2 represents the discrete probability distribution at each triangular grid point (i, j), p Along with a 1 2 (N + 2)(N + 1) × 1 2 (N + 2)(N + 1) stochastic (non-stationary) transition matrix M ≡ {p ij } (whose entries are Equations (22)-(31)) with rows adding to 1 driving the dynamical system: We initiate the process with an initial discrete distribution P 0 . This framework allows us to carry out Monte Carlo simulations to generate probability distributions (using Equations (22)-(27) more practically) of the three cell types for any initial distribution.

Continuous Limit N → ∞ Which Relates the Moran Process to the Adjusted Replicator System
The deterministic and stochastic models are connected by the limit N → ∞ in which the discrete finite cell frequency-dependent model (35) converges to the adjusted replicator system (1). This connection was first made explicit in [28,29] with an earlier approach in [35]. These papers accomplished two things. First, the authors of [28] showed that different microscopic stochastic processes for two-strategy games lead to either the replicator system or the adjusted replicator system in the limit N → ∞. They also derived an explicit mean-field Fokker-Planck equation describing the probability density function for the different sub-populations. This work was extended in [29] to an arbitrary number of strategies (sub-populations), with the possible inclusion of mutations leading to the adjusted replicator dynamics as N → ∞. Our method of quantifying the uncertainty associated with adaptive chemotherapy schedules exploits this stochastic-deterministic connection. The deterministic adjusted replicator Equation (1) allows us to unambiguously design a beneficial schedule giving rise to the closed evolutionary cycle, while the stochastic fitness-dependent three-strategy Moran process allows us to quantify the variance associate with repeated cycling of the schedule for finite N.

Results
To understand our results, we start with an example in Figure 2a of one (of many) closed deterministic evolutionary cycle ABCA in the (S, R 1 , R 2 ) tri-linear plane, along with two stochastic realizations starting at the same point A but not closing the loop. The deterministic cycle is exactly closed and repeatable, but the probability of a stochastic cycle being closed is vanishingly small. Figure 2b depicts the chemoschedule that produces the closed ABCA cycle, which we use as our adaptive chemoschedule for the finite N model, repeating it in a series of cycles. The key to designing a deterministic closed cycle (see [19,26] for more discussion) is to realize that (i) in the absence of chemotherapy, all trajectories converge to the S corner; (ii) with C 1 on (above threshold levels) and C 2 off, all trajectories converge to the R 1 corner, and (iii) with C 2 on (above threshold levels) and C 1 off, all trajectories converge to the R 2 corner. These three facts make it straightforward to identify trajectories that cross, and then by turning off and on the drug schedules at precisely the times when they cross (points A, B, and C), one can create a closed cycle. These closed cycles are plentiful and can be designed so that they are located throughout the triangular domain, with a large range in the sizes of the enclosed areas [19]. The results described in this section do not depend, in any crucial way, on which closed cycle we choose in the tri-linear plane and can be viewed as general.  What is the size of the error made in using the stochastic (finite N) model with the adaptive chemoschedule that puts us on the closed ABCA cycle in the deterministic problem? In Figure 3a, we show the spread of the endpoints in the (S, R 1 , R 2 ) tri-linear plane after one evolutionary cycle, for the case e = 0 when the two drugs act additively on the population. The cycle is initiated at (S, R 1 , Figure 2), for 10,000 trials, with N = 100,000 cells. Additionally shown are the two orthogonal principal components associated with the spread of the data, which we then map, in Figure 3b, to the two principal axes. The kernel density estimates (KDE) for the distributions are shown in Figure 3c,d. The distributions, after one loop, closely resemble multivariate Gaussian distributions in the principal components. After repeated loops of the cycle, the multivariate Gaussian distribution starts to break down, as shown clearly in the principal component coordinates (Figure 4), which shows the KDE after 2 loops (Figure 4a  To examine why the multi-Gaussian nature of the distribution breaks down, we show in Figure 5 histogrammed distributions of the three sub-population frequencies. By the third loop, the distributions start to impinge on 0 (one of the sides of the tri-linear plane) or 1 (one of the corners of the tri-linear plane), indicating that a sub-population either vanishes or saturates, distorting the multi-Gaussian spread. By the fourth and fifth loops, the distributions start to converge at these endpoints and the distribution no longer resembles a multi-Gaussian.
The histogrammed distributions for synergistic (e > 0) and antagonistic (e < 0) drug interactions are shown in Figure 6a,b. In both cases, the distribution begins to distort after roughly three loops, with the synergistic interactions breaking down slightly sooner in the third loop. This is shown nicely in Figure 7 which depicts the two principal components (on a log-linear plot) as a function of cycle number for additive, synergistic, and antagonistic interactions. In all three cases, the variance of the distribution grows exponentially with loop number. Antagonistic drug interactions outperform additive interactions, which both outperform synergistic interactions. One could argue this is because antagonistic interactions are able to suppress competitive release of the resistant populations more efficiently than synergistic interactions, as discussed in [26]. As N → ∞, the variances will go to zero. What effect does this breakdown have on the ability of the model to suppress competitive release and control tumor growth? For this, we use the tumor growth Equation (16) as shown in Figure 8 where we plot treated vs. untreated volumetric tumor growth using adaptive scheduling. The solid lines depict the deterministic growth curves (tumor recurrence occurs around t ≈ 50), while the error bars show the spread of the adaptive stochastic runs. Our basic message from this plot is that while the stochastic spread (associated with tumor growth) increases as time progresses, it is perhaps not as much as one might expect from the fact that the adaptive schedule is not producing closed evolutionary cycles.

Discussion
The exponential increase in the distribution variance with each adaptive cycle degrades the precision of each subsequent round of chemotherapy since the cycles were designed to stay in a closed evolutionary loop only in the deterministic limit N → ∞. Ideally, the time-dependent chemoschedule keeps the three sub-populations in long-term competition with each other to avoid chemoresistance indefinitely, although delaying chemoresistance through several cycles may be all that is practically achievable due to stochastic fluctuations. When examining the tumor growth curves (Figure 8), we can see that the mismatch between the deterministic chemoschedule, when run in a finite-cell stochastic model, gives results that are nonetheless better than what could be expected given the exponential increase in variance. Nonetheless, we point to several concrete strategies that could help extend the timeframe over which the schedules could remain precise, even if the evolutionary loops do not close. Three strategies are discussed in [36]. The first is to explore other possibilities than the commonly used and accepted 50% rule which identifies a nominal baseline tumor size and advocates applying chemotherapy until the size shrinks by half. Then, when the tumor reaches the nominal baseline size, chemotherapy is again implemented for another cycle, etc. It is easy to imagine that there could be better, more optimized options, but none have been clinically tested as far as we are aware. The second is to explore the role of tumor size in deciding when to apply chemotherapy. Typically, chemotherapy commences as soon as a tumor is discovered, but it is conceivable that this may not be optimal in all cases. In [5], we explored using different chemotherapy schedules for different tumor growth rates, but these ideas have not yet been tested in the clinic. The third idea discussed in [36] is to explore the optimal frequency distribution of the cell populations in which to begin adaptive therapy. From the point of view of our mathematical model, these ideas generally relate to the size and location of the ABCA cycle in the tri-linear phase plane that would be optimal for an adaptive schedule. We, of course, have discretion over both where in the tri-linear plane the cycle starts (location of point A in Figure 2a), and how large an area the evolutionary loop ABCA should enclose, both of which are tied to the suggestions made in [36]. Optimizing these decisions could very well have medical implications that could and should be explored further.

Conclusions
The lessons learned in probing our finite-cell stochastic model using a chemoschedule design from a deterministic model are both sobering and encouraging. On the one hand, it is clear that designing an adaptive schedule using the deterministic model is far more practical than designing a tailored one for each individual stochastic run. The simplicity of this approach, however, comes at the price of not having exactly the right timing and dosing levels for any single realization, which leads to a mismatch between the designed closed loop and the stochastic sub-population levels. This, in turn, leads to an exponential increase in the error variances associated with repeated evolutionary loops meant to keep the sub-populations in perpetual competition. The good news is that the effect that this mismatch has on tumor growth curves seems small-it is possible that the chemotherapy cycles will still be good enough to delay resistance in a wide range of patient populations and tumor sizes. Of course. this is based on the results of a mathematical model and computer simulations, not clinical trials in real-world settings, and should be interpreted in that context.

Conflicts of Interest:
The authors declare no conflicts of interest.