1. Introduction
Optimal therapy strategies for cancer patients can be formulated as a control theory problem, [
1,
2,
3,
4,
5,
6,
7,
8], to determine the optimal dose and schedule of chemotherapy, or optimal combination and timing of treatments to control tumour size and achieve the best outcome. Available treatments, the controls, include chemotherapy, resection surgery (tumour removal), radiotherapy, and transplant (blood cancers). For chemotherapy, the optimal concentration of the drug(s) at each point in time must be determined. Control theory formulations require specification of a model of cancer growth under therapy (incorporating the mechanism of action and efficacy of a drug), and a mathematical function (the objective) to be optimised, potentially a function of the tumour state, patient health and therapy strength/impact, both throughout the treatment interval and at the end of therapy. This control problem can be formulated in a deterministic framework (ODEs, PDEs), allowing Pontryagin’s Maximum principle (PMP) to be used [
9], or as a stochastic control problem. Deterministic control theory frameworks have substantial advantages, being both analytically and numerically more tractable. There is also a far more extensive literature on deterministic models of cancer growth. Tumour growth is complex and notoriously difficult to model and parametrise, although with recent technological innovations there are an increasing number of validated chemotherapy cancer models of varying complexity [
10,
11,
12,
13].
The optimisation criterion is a fundamental part of the optimisation problem and has a significant impact on the nature of the optimal solutions [
5,
14]. Optimisation criteria are needed since there are competing objectives for therapy—the control of the tumour (for instance, minimising tumour size at the end of therapy) and minimising the risk of therapy to the patient. A chemotherapy control problem is typically set up as follows: treatment occurs up to a time
T (the time horizon) using a time dependent control
that is assumed bounded,
, where
is, for instance, the maximum tolerated dose (MTD) of a chemotherapy drug. The optimal dosing protocol,
, is then determined by optimising an objective function, for instance,
comprising a terminal cost proportional to the tumour size
at the end of the treatment horizon
T and a running cost that accounts for factors such as drug toxicity during treatment. Here
is the tumour size at time
t. The positive constant
r determines the relative weight of the two competing objectives, specifically balancing the need to minimise the tumour burden at the end of treatment and reducing toxicity to the patient. The
norm, with
, is more amenable to mathematical analysis, whilst the
norm is more biologically justifiable. A variety of other objectives have been used, but all typically have the same qualitative form and incorporate a weight to balance the competing objectives.
Such objectives do not have any medical justification or foundation, only capturing qualitatively the competing demands of treatment which limits their application to decision making in the clinic. Further, this formulation assumes a continuum of benefits and costs, whilst cancer therapy is punctuated by key events that substantially affect outcome and post-treatment prognosis. In particular, in these deterministic models the cancer regrows in the absence of therapy and, therefore, tumour elimination (cure) is ignored; these optimisation problems in fact restrict focus to the period
and ignore outcomes beyond
T. There are also other events of key importance to patient prognosis, for instance treatment can have severe adverse effects (SAEs) on patient health, [
15], drug resistant (mutant) cells can appear [
16], and the cancer may undergo metastasis, i.e., spreading and colonising to other tissues and sites in the body [
17]. SAEs can result in cessation of treatment or patient death in extreme cases, drug resistance limits possible therapies [
18], whilst the prognosis for metastatic cancers is poor [
19].
Here we propose a new objective function that incorporates these key events,
Figure 1, including complete remission (cure), failure to eliminate the tumour, SAE, metastasis, and generation of mutants (both treatable and untreatable) during treatment. It utilises the rate of these events, so that at the end of therapy we can calculate the probability of each event having occurred, events that impact patient outcome and future prognosis. Thus, our formulation implicitly incorporates considerations post-treatment. To formulate an objective we observe that these events impact patient lifespan, thereby giving a common measure to quantify the impact of these events. We therefore propose to maximise the patient’s expected lifespan, averaged over possible events. This idea was proposed and explored in an earlier work using a simple approximation to the tumour elimination probability [
20]. Consider a patient with a tumour of size
, that is treated over a (horizon) time
T, treatment administered as a control
,
that directly affects the tumour’s growth dynamic. For instance, this could model a chemotherapy drug administered at concentration
. The outcome of therapy, and the expected lifespan of the patient, is determined by a set of countable events indexed by
. We assume that events occur at time
t with rate
, which depends on the tumour size
, the control
, and possibly explicitly on time. Events, thus, follow a Poisson process; the probability of event
j occurring by time
t is
, which is a function of the tumour history and control schedule. The rate of tumour elimination is dependent on fluctuations in the number of tumour cells; thus, we draw on branching processes to determine the probability of tumour elimination. The occurrence of an event may affect the rates of other events, thus, multiple event sequences need to be considered. Let
k index all possible (allowable) sequences of events or
futures (there are a maximum of
, assuming an event happens only once); event combinations/futures are distributed as a multiple event Poisson process, giving the probability
of event combination
k occurring by time
t. Under event combination
k let the expected lifespan be
. Therefore, under control schedule
u the expected lifetime, or payoff, is,
More generally this could also incorporate an integration over futures parametrised by a continuum. The optimal control schedule
, is then the schedule that maximises this expected lifetime, i.e.,
maximises (
2) subject to the growth dynamics. The outcome at the horizon time can then be classified into which events occurred, event
k occurring with probability
. This objective has to satisfy a consistency constraint; if the optimal control is zero at the horizon time,
, then the expected lifespan is invariant to a reduction of
T as the therapy is unchanged—the expected lifetime only changes by application of the control. In particular, if
is zero at
T the probability of tumour elimination must also be invariant to changes in
T.
We develop this therapy optimisation framework within a deterministic context, which has substantial advantages over stochastic models in terms of model flexibility (with a range of nonlinear and multi-compartment ODE tumour growth models in the literature [
1,
2,
3,
4,
5,
6,
7,
8], and powerful optimisation tools such as PMP [
1,
21]. The competing objectives of therapy, tumour control/elimination, and minimising treatment related harm to the patient, are incorporated as impacts on the expected lifespan. We develop two payoff functionals. Firstly, we formulate a model where the drug impacts patient quality of life, giving the discounted expected lifespan payoff (DLP),
Section 2, similar to the objective (
1). Secondly, we propose a model where the negative effects of therapy are incorporated as SAEs. Specifically, we model treatment related mortality (TRM) events giving the severe adverse effects payoff (SAP),
Section 5. For both of these payoffs, the optimal solutions separate patients into an untreatable class and a treatable class based on patient demographics, tumour size at detection, and the patient’s susceptibility to the drug’s side effects.
Maximising patient lifespan is not a new concept; it has been used as an optimisation criteria in a control theory analysis of patients with lethal/incurable forms of cancer. Originally analysed in [
22] for the cases where resistant mutant cells pre-exist, it has been extended with a number of analyses since, including [
23] which maximises the time to remain within a safe (or viability) region. Typically, these models incorporate tumour and normal cells, the safe region being defined by thresholds on both cell types. Maximising life expectancy is also similar to maximisation of overall survival (OS) used in sequentially adaptive medical decision-making where treatment is considered in multiple stages (drawn from a set of defined therapies), also known as dynamic treatment regime models, [
24]; for instance, induction therapy and salvage therapy if induction fails. These problems are formulated in the Markov decision making framework. Such methods have been used to optimise sequential drug use, for instance, for acute myeloid leukaemia (AML) [
25]. These models are based on probabilistic outcomes of events to determine optimal strategies and do not include explicit tumour dynamics, and are, thus, distinct from our approach.
This paper is organised as follows. In
Section 2, we formulate the DLP. In
Section 3, the functional form of the control probability of tumour lineage elimination is derived. In
Section 4, PMP is applied to determine the optimal solutions of the DLP, subject to non-linear tumour evolution models. In
Section 4, optimal drug administration strategies of the DLP subject to a logistic tumour evolution model are presented. In
Section 5 the SAP is introduced and optimal drug administration strategies are determined. In
Section 6, we discuss our framework and its limitations, and provide future directions.
2. The Discounted Life Expectancy Pay-Off (DLP)
The cost of treatment to the patient needs to be incorporated into the objective
, (
2); for instance, the impact of drug toxicity on the patient. The simplest toxicity model is to deduct from the payoff (
2) a cost of treatment measured in days,
, assuming toxicity is linear in the drug concentration. This simple model of toxicity is often used in control theory analysis of tumours [
1,
9]. In the lifetime context, this can be interpreted as a poor quality of life during treatment, e.g., nausea caused by the drug. Duration of treatment is in fact a significant covariate of a patient’s quality of life [
26]. Assuming expected lifetimes are linear in age, the expected lifetime of outcome
k at time
T in the future is
. We then obtain the payoff,
where both the payoff and the cost
are quantified in days. Then,
is a measure of the quality of life during treatment, effectively the days during treatment ‘worth living’, which can possibly be determined from quality-of-life (QoL) aggregate measures derived from patient-reported outcomes [
26]. If drug concentration is rescaled to the maximum tolerated dose (MTD), i.e.,
, then this interpretation imposes
, and is expected to be of order 1.
Consider two possible outcomes—cure (tumour elimination), and a failure to eliminate the tumour, so the patient will subsequently relapse. Define
as the probability that the treatment achieves tumour elimination within time
T, a functional of the tumour history. Define the expected lifetimes post-treatment,
for tumour elimination and no elimination. The expected lifetime
could be parametrised by tumour size,
if dependence is known. The pay-off function then reads,
assuming a linear drug toxicity effect. Rearranging, we get
the second line following under the assumption of linearity of expected lifetime estimates over the period
T. Here we are assuming that the treatment horizon is shorter than the lifetime
, otherwise,
is negative and death during treatment would need to be separated out as a possible outcome, see
Section 5.
4. Optimal Therapy Solutions of the Discounted Expected Lifespan Payoff with Cure or Relapse Outcomes
We can now formulate the optimisation problem. We want to determine the optimal chemotherapy protocol by maximising the discounted expected lifespan pay-off (DLP) (using the TEP probability (
23) with (
22)),
which incorporates two outcomes, cure versus a failure, to eliminate the tumour. Here,
, the spontaneous cure probability, with
, (
22). We use the approximate general form, (
21),
We want to determine the optimal drug regimen that maximises this pay-off.
We consider a generic one compartment tumour growth model, (
14), with a linear dependence of the death rate on drug concentration
u (the control), specifically,
Here, the drug is assumed to not affect the birth rate, i.e.,
. We define
for convenience. The tumour capacity
K in the absence of drugs, assuming large (10
8–10
10 cells), satisfies
, i.e.,
. We assume
is concave on
. We assume
, and
, so the net growth rate per cell
decreases with
N,
. Here, prime denotes differentiation with respect to
N. The drug efficacy likely falls with population size; we assume
is not increasing (so monotonically decreases, but not necessarily strictly),
. For tumour control at the MTD (
) we need
,
. For small
,
is the mean of a branching process with birth, death rates
,
in the absence of drugs.
Our optimisation problem is to determine the optimal drug concentration at each point in time,
, that maximises the LEP. PMP reformulates this infinite-dimensional optimisation problem into a finite-dimensional two-point boundary value problem (TPBVP), thus, offering both analytical and numerical methods of solutions. Typically, there are two types of optimal solutions for bounded controls [
28,
29]: (i) bang-bang solutions, where the control function can only take values on the boundary of the allowed control range (i.e.,
and
), switching abruptly between these extremes; (ii) singular solutions, where the control function takes intermediate values within the allowed range, allowing the administered drug dose to vary continuously over time. Singular and bang-bang solutions can be combined. PMP is a necessary but not sufficient condition for optimality. Therefore, further analysis may be required to confirm that PMP solutions are in fact optimal [
30]. By using PMP, the optimal solutions are given as follows:
Theorem 1. For sufficiently large K, the optimal solutions of (26) that maximise objective (24) are bang-bang with at most 1 switch from to , so either no treatment, treat-and-stop, or the MTD. For sufficiently small horizon time T, solutions are no treatment. Proof. We use PMP which gives sufficient conditions for an optimal solution, [
1,
21]. The Hamiltonian is given by:
where
are the costates of
N and
W, respectively, and we have introduced
. The costate dynamics are given by:
suspending explicit dependence of functions on
N for simplicity. The transversality conditions are given by:
Thus,
for all time and
. As a function of
, we have
; thus,
has a maximum at
, so
.
We have,
since
. Thus,
for all
since it terminates with
. However, we can improve on this bound. We define
then
, and
Expressing this in terms of
s, we obtain
Hence,
provided
and
. If
, then
is an
s-nullcline. Therefore,
for
and, thus,
, for
.
Since the Hamiltonian is linear with respect to the control
u there exists a switching function:
Since the
dynamics decouple from those of
, we can consider dynamics in the
phase plane, with
fixed. The switching curve separates the phase plane into two regions,
Figure 2, effectively stitching two dynamical systems with
and
; there may be dynamics on
if there are trajectories on this curve, i.e., singular solutions with
and
. The flow and optimal trajectories can be determined as follows. We define three curves:
The switching line
is given by
Since
and finite for
, we have
as
. There is a unique zero,
such that
, i.e., it satisfies
. Since
K is assumed large, and
, we have that
. It is unique since
is monotonically decreasing,
. For
, we have
. Since
, we have for
that
so there is a local minimum at
. We have
for
; the drug induced death rate may have a zero,
for some
, so
may also have a local maximum in
.
The s-nullcline, .
- -
In the region
, we have, using (
30), that
. Since
is concave on
, and zero at
, there exists a unique turning point,
. The
s-nullclines are, thus,
and
where
at
.
- -
In the region
, we have (
, on interval
),
which has a pole at
. Thus, this nullcline exists in
region where
. If
in
, i.e., the population decay under the MTD slows down with
N, there may be a second pole.
The level sets
. We have the following expressions for the Hamiltonian, which is a constant on optimal solutions (PMP).
since
. Thus, a trajectory on
is given by
The N-nullclines are if and if . There is, thus, a fixed point in the region. Clearly, since on if , the FP is a saddle. There is no fixed point in the region, as as .
Optimal solutions satisfy PMP and, thus, must be a trajectory in our phase space. They start on the line , and terminate on . Since optimal trajectories terminate on in finite time, trajectories can only reach with , i.e., they intersect the N-axis with , when , is a nullcline and cannot be reached in finite time. In fact the flow suggests trajectories leave the region by crossing the switching curve . Optimal solutions are, thus, trajectories that reach and terminate with . We define the trajectory that passes through , i.e., reaches the intersection of and . We prove that separates the phase space into trajectories, only trajectories below can terminate.
First,
approaches
from the
region. Consider the trajectories in
(
) when they cross
. We have, using (
31),
Using the dynamics (
26), (
30), we, thus, have, setting
,
This is strictly
for
. Thus, the level set
reaches
for
with
.
Consider the expression for the Hamiltonian,
, then the trajectory
that intersects
at
lies in the level set
. We, therefore, have on this trajectory in the
,
region, (using
),
At the extremes,
and
, we have
, i.e., it intersects the switching curve twice, and only twice. Since optimal trajectories must terminate by crossing
in the interval
, the only optimal trajectories, besides
, are those that lie below
, or are already on
.
The trajectory in fact defines a bang-bang solution with a single switch from to . The optimal solution continues in the region on and can have arbitrarily time on . These are treat-and-stop solutions, and can be constructed to have arbitrary high horizon times T. The fact that is related to the invariance of the payoff J to extending the horizon time T. Trajectories below terminate in , i.e., are MTD solutions; these solutions have a longer horizon time since . Finally, there is a branch of no treatment solutions with for all time that start on, and stay on . This dynamic on is related to the invariance of the objective to T once drug administration ends.
The case with is a special case (, so no lineage die-out). In this case, can have a pole at , for instance if for a cell cycle drug. The proof still holds by considering only the open region .
Therefore, we have only three cases for the solutions to PMP:
No treatment solutions (with for all t).
Treat-and-stop solutions (with for and for all ; is the period of drug administration.
Continuous MTD, throughout the entire therapy (with for all t).
Above we considered
fixed. For each
we have proved that there are three possible branches of solutions: Firstly, the no treatment branch with unrestricted horizon time (
). Secondly, we have treat-and-stop solutions with
, where
These solutions exist if
. Finally, we have the MTD solutions with horizon time
(if
positive) and
when
.
is determined by the value of
G which itself is a function of the dynamics (
25) and, thus, on the trajectory
. Thus, for a given horizon time
T we need to solve for the initial value
that gives the correct horizon time for each value of
, and then determine
by solving the constraint on
.
We can also prove that for sufficiently small
T , and, thus,
for
. This follows using the bound
. We have a sufficiency condition for
(
) if
where
is given by (
29). Since
is proportional to the TCP,
, it is small for cases where the tumour size cannot be reduced to
since only then does the TCP increases rapidly. Define
assumed small, (under the MTD
]). Then,
,
, the former following since
, (28). Thus,
for small horizons
T, and the optimal solution is no treatment (
). When the horizon is sufficiently large, i.e., of order, or larger than
, then regimens with non-zero
are possible. The expression (
35) then gives a lower bound on the time of the MTD for those solutions with treatment. □
Optimal Controls for Logistic Growth: Numerical Study
We analyse the optimal solutions of the DLP for the specific case of the logistic model and treated with a cell cycle drug,
Dividing cells are killed by the drug with efficacy
, hereafter referred to as the killing fraction of the drug. The lower bound ensures that the tumour size decays at the MTD.
g is the per-capita growth rate of the tumour and
K is the tumour’s carrying capacity in the absence of the drug. This has the form (
26) with
. We set
for simplicity, i.e., the natural death rate of cancer cells when the tumour size is small is negligible. The birth rate is drug independent by the assumptions above, i.e.,
, so at low cell density we have
. The death rate is solely due to the drug,
, and is density dependent because only cells in cell cycle are killed by the drug, and the cell division rate is density dependent.
The optimal control problem consists of maximising the payoff, (
24),
with
, subject to the tumour growth given by Equation (
36) and the dynamic of
W given by, (
20),
From Theorem 1, we know that all solutions to PMP have
(MTD) for an interval
,
, the optimal control problem reduces to a one-dimensional optimisation problem of the drug application duration
. By expressing
N and
W as functions of
x, the payoff can be written as:
with
, and
See
Appendix C for derivation. We maximise
using a sequential quadratic programming method (fmincon solver in matlab). The model parameters were chosen as follows: the growth rate (
g) and the carrying capacity (
K) of the tumour, were estimated by [
31],
cells/day and
cells, respectively. It is assumed that the drug is 100% efficient, so
. For reference,
Table 1 outlines the key parameters from the tumour evolution model and the DLP.
The type of optimal solution (for a fixed initial tumour size
) is dependent on the degree to which the tumour can be reduced over the horizon time
T,
Figure 3. Thus, for short horizons, only no treatment solutions exist since the TEP is negligible and toxicity dominates, as proved in the general case,
Section 4. At horizon time
, the optimal solution switches to full MTD, the drug infusion period jumping from 0 to
T at
,
Figure 3B. This is because the pay-off for a continuous MTD regimen is smaller than that for no treatment when
, and crosses it at
,
Figure 4, i.e., the TEP is sufficiently high for the expected gain in lifespan to exceed the toxicity costs. At
, continuing at MTD beyond an infusion period
is counter productive, toxicity costs again outweigh gains in the TEP which is close to 1 and, thus, cannot be increased much further,
Figure 3C. Thus, the optimal solution switches from continuous MTD to treat-and-stop. The optimal infusion period (
) is then independent of the horizon time
T as required,
Section 3. Thus, for
the optimal drug regimens are independent of
T and the pay-off is constant. Therefore, optimal solutions are limited by a too short-time horizon when
, and for
the pay-off (and expected lifespan) can be increased by increasing the time horizon.
Optimal treatment is explored in
Figure 5 against drug toxicity
and the patient lifespan
if they are not treated; trends are identical for the cases of a short and long lifespan if the tumour is eliminated,
years. The drug infusion interval decreases as toxicity
increases, and the lifespan gain
decreases, with a critical boundary (in
space) between untreatable (no treatment is optimal) and treatable patients. The drug infusion period jumps from zero to over 40 days in this case as therapy switches from no treatment to treat-and-stop therapy,
Figure 5A,D, with a corresponding jump in the
from 0 to near 1 (in this case over 0.95),
Figure 5C,F. Since the drug infusion period is below the time horizon,
days, patients in the untreatable category are untreatable for all time horizons. The gain in (discounted) lifespan is approximately linear throughout the treatment range in both
and
,
Figure 5B,E.
The discounted expected lifespan (DLP) is appropriate for modelling tolerable nausea; severe toxicity costs of therapy are better modelled as SAEs, see
Section 5. Under the quality of life interpretation of the DLP we have
order 1. We illustrate optimal solutions for
. We consider a patient with a short life expectancy under no treatment,
month, examining therapy for various initial tumour sizes
and patient age (assuming an average life expectancy of 85 years, patient age is
years),
Figure 6. For an efficacious drug, the gain in life expectancy is substantial, with life expectancy approximately equal to the life expectancy of a healthy individual (achieving over
of the life expectancy of a healthy individual), regardless of initial tumour size and
. The time horizon is
days whilst the drug infusion period is 28–49 days, increasing with both the initial tumour size and
. Thus, all optimal solutions are treat-and-stop and independent of the time horizon. The drug infusion period is only weakly dependent on
after a rapid rise, whilst tumour size is the main determinant. All patients are treatable for these parameters (
years,
month,
,
).
5. Severe Adverse Effects Model: Incorporating Treatment Related Mortality Events
The DLP of
Section 4 is only appropriate when toxicity is tolerable. Treatment toxicity can, however, be debilitating, leading to treatment cessation or even death (treatment-related mortality, or TRM) [
32]. Adverse effects are graded 1–5 under the common terminology criteria for adverse effects (CTCAE), a classification managed by the Cancer Therapy Evaluation Programme [
33]. Grades 1 and 2 are mild, serious adverse effects are 3–5, grade 3 events may require hospitalisation, grade 4 is life threatening and in need of urgent medical care, and grade 5 is death through an adverse event. Thus, the negative effects of treatment can be incorporated as a SAE event, causing cessation of treatment and/or death.
Here we consider the event of patient death through TRM, with other events, such as tumour elimination. TRM is modelled as a Poisson processes with rate
, assumed a function of time and the current drug concentration. The expected patient lifetime is then given by
comprising two terms corresponding to the events of surviving therapy, with probability
, and TRM, with probability
, respectively. The last term is the expected lifetime conditioned on TRM. Here
is the probability of death caused by treatment over time
t. Generally we can have multiple possible futures
with expected lifetimes
. Only if the patient survives treatment are these events relevant. There is, thus, a dependence hierarchy; survival depends on
only (and is independent of the other events), and the other events (can) depend on the state history
.
5.1. Logistic Tumour Growth with TRM Treated with a Cell Cycle Targeting Drug
Here we explore optimal solutions for the logistic tumour evolution model treated with cell cycle targeting drugs, i.e., tumour dynamics (
36) supplemented with the BP birth rate
a as in
Section 4 (i.e., independent of the drug). We assume that the patient death rate due to the drug is proportional to the amount of drug administered and is given by
, with
, and
the expected period a patient can survive under continuous MTD drug administration, hereafter referred to as
the drug tolerance interval. We consider three outcomes at the time horizon: the patient does not survive treatment (TRM), cure (tumour clearance and survival to
T), and failure to eliminate the tumour, so the patient will eventually relapse. As before, we assume
. The payoff with these three outcomes is:
where
is the treatment dependent elimination probability, TEP, (
23). The parameter values for the tumour dynamics (ODE model and BP model) are as in
Section 4. The parameters used for the SAP are outlined in
Table 2, whereas the tumour evolution parameters are outlined in
Table 1.
Applying PMP, the optimal solutions are bang-bang, and as with the DLP there can only be three types of solutions: no treatment, the MTD for the full horizon, and treat-and-stop solutions (at the MTD), proof in
Appendix D. We numerically determined optimal solutions using a direct method for bang-bang solutions, see
Appendix E; specifically, the control is piece-wise constant taking only the values 0 or 1 on a partition of
. A sequential quadratic programming (SQP) method (Matlab fmincon solver) was utilised to determine the optimal switching times.
Similar to
Section 4, for small time horizons optimal solutions are no treatment,
, continuous MTD for intermediate horizons,
, and treat-and-stop solutions for large horizons,
. For large enough time horizons,
T, the optimal solutions are independent of
T and all optimal treat-and-stop solutions have an identical drug regimen,
Figure 7.
We illustrate the TRM model with
years and
months,
Figure 8. The patient’s life expectancy, the optimal drug administration period, the patient’s survival probability and the cure probability increase with
,
Figure 8. For
days, the probability that a patient survives therapy is close to
,
Figure 8B with a TEP post-treatment in excess of
,
Figure 9. Patients can be divided into a treatable and an untreatable category based on the value of the drug tolerance interval,
; patients are treatable only if they can survive continuous MTD drug administration for 7 days or longer,
Figure 8C; these patients are treated with the MTD for a minimum of 38 days,
Figure 8C. Thus, there exists a critical value of the drug tolerance interval,
, where the optimal drug administration period jumps from 0 to a non-zero value. Despite the abrupt change in the optimal drug administration period, the patient’s life expectancy under optimal treatment remains continuous at the critical threshold
Figure 8A,C. A comparison of no treatment to the MTD is shown in
Figure 10 for
and
, the switch of solution causing a jump in the drug infusion period whilst retaining continuity in the pay-off. MTD drug administration has a local maximum in the payoff with drug infusion period that becomes the global maximum at
,
Figure 10, having a payoff
.
Patients in the treatable category with low drug tolerance intervals
receive treatment for a duration exceeding their expected survival time under continuous MTD administration,
Figure 9. The optimal drug administration period increases with
, but becomes eventually shorter than
. Thus, at low values of
, optimal drug strategies focus on achieving tumour elimination despite high TRM risks. In contrast, at high values of
, tumour control is more readily achieved, and drug toxicity becomes the primary factor limiting the infusion period.
The optimal drug administration period,
x, increases with both the initial tumour size (
) and the drug tolerance interval (
),
Figure 11A; the strong dependence of
on
x is not surprising as the tumour needs to be reduced to the order of a few cells before elimination is feasible. The TEP is almost independent of
for fixed
and is strictly increasing with
, taking values larger than
,
Figure 11C; this is achieved by the total amount of drug increasing with the initial tumour size. The cure probability increases with
with a shallow decrease with the initial tumour size,
Figure 11D; this is because of the decreasing probability of the patient survival under a longer drug administration period. The cure probability and the relative life expectancy under optimal treatment have similar dependence on
, taking similar values
Figure 11B,D. Dependence of the optimal solutions on
, or age, can also be explored,
Appendix F. Dependence on age is weak.
Patients can be subdivided into a treatable and an untreatable class, based on the drug tolerance interval and the size of the tumour prior to therapy,
Figure 12. Treatable patients, who are characterised by
values that are close to the boundary of the two classes, are treated with large periods of MTD drug administration (spanning from 25–50 days),
Figure 12A, whereas the cure probability is of the order of
,
Figure 12B. Optimal drug regimens favour the objective of eliminating the tumour over the objective of retaining low levels of drug toxicity. Because
is equal to 40 years and
is 2 months, the potential gain in life expectancy under drug administration can be much larger than the life expectancy under no treatment, even when the cure probability is small.
5.2. Intermediate SAE and Iterative Therapy Under LEP Optimisation
SAEs can lead to cessation of treatment for a period of time or a shift to another treatment, for instance the maximum drug dose could be reduced, or a less intensive drug used. Treatment with cessation events, thus, occurs in phases, phase j having previous SAE events. The objective is formulated as follows. We define the expected patient lifespan for each phase of treatment, i.e., is the expected lifespan after SAEs, with the last occurring at time ; phase j, thus, starts at time s with tumour size determined by the treatment before s. Here the drug concentration is a vector, , with the patient being treated with drug concentration (component j) in the th treatment phase, which could in fact be the same drug but with a reduced MTD.
Consider the case with a potential SAE event with dose dependent rate
, that causes the current therapy to cease and a switch to an alternative therapy, drug concentration
. We assume this alternative treatment is not subject to the possibility of an SAE. Similar to the case of TRM, (
40), we have the expected lifespan,
where
is the treatment dependent elimination probability, TEP, (
23), and
is the expected lifespan conditioned on switching to treatment 2 at time
t. This is an adaptive therapy, since an SAE is an observation event that changes the therapy if it occurs.
To compute the expected lifetime (
41), we, therefore, need to solve for the optimal treatment
, for any SAE time
t, initial tumour size
, by maximising the conditional LEP,
where
is the expected lifespan on tumour elimination,
is the expected lifespan if the tumour is not eliminated, and
is the probability of tumour (lineage) elimination conditioned on an SAE at
t. In practice, the SAE may affect these lifetimes. We, thus, have a complex iterative optimisation problem. To accommodate SAE grade, rates for each grade of SAE would need to be defined, whilst allowing for multiple phases of treatment (multiple SAEs) would further extend the iterative depth.
6. Conclusions
We have proposed a new optimal control criterion for determining optimal chemotherapy scheduling and doses within the context of key events affecting outcome, such as tumour elimination (cure, if patient survives treatment) and patient death through TRM. We propose that maximising the patient’s life expectancy by averaging over all possible outcome events is a realistic interpretation of the objective of therapy, i.e., is a quantifiable measure of the best patient outcome. We developed the proposal within a deterministic modelling framework based on the rate of event occurrences, such as tumour elimination. The life expectancy pay-off is very general; here, we illustrated its use on chemotherapy with three possible events (cure, failure to clear tumour cells, and TRM) whilst it can be generalised to include surgery, transplant, combination therapy, adaptive therapy, and multiple stage therapy. Tumour growth dynamics (both with and without therapy), and the event rates need to be parametrised, specifically the life expectancy and SAE rates in terms of therapy parameters. Thus, for chemotherapy we need to incorporate drug mechanism and drug efficacy into the tumour dynamics, and the rate of TRM. Parametrised tumour models have been developed, [
11,
34,
35,
36], whilst TRM rates, [
32,
37,
38] and survival data (overall survival and progression free survival) may allow these additional parameters to be estimated. Incorporating other events will need their rates and dependence on tumour characteristics, such as size and replication rate and therapy to be determined. If parametrisation includes patient demographics then this is a natural framework for personalised therapy optimisation.
The optimisation criterion we propose addresses various challenges in the field of applied optimal control for cancer chemotherapeutics, and offers a number of advantages over previous formulations,
Table 3. To our knowledge, there has been no previous attempt in the literature to formulate an optimal control problem that considers tumour characteristics after therapy; typically, the performance criteria for evaluating a therapy’s anti-tumour efficacy is based on the tumour’s state at the end of treatment. Our lifetime expectancy pay-off addresses this issue by modelling stochastic events during therapy, including tumour elimination, events that determine patient prognosis. Therefore, our formulation inherently incorporates considerations post-treatment horizon, and in fact quantifies the probability of each outcome (set of events). Being based on the expected lifespan, the lifetime payoff has two additional attractive properties. Firstly, the optimal solutions are independent of the horizon time
T once drug application ceases. In effect, provided a sufficiently large horizon time is used, all solutions are then independent of
T. To our knowledge no other optimal cancer therapy (finite time horizon) control problem has this property. Secondly, our objective has no subjective free parameters as it is interpretable. This contrasts to traditional approaches of cancer therapy optimisation that frame it as a multiple objective optimisation problem. For instance, the two competing objectives of controlling the tumour size and limiting the impact of toxicity to the patient are often combined as a weighted sum [
1,
9]. A drawback of this method is that the weights are arbitrary, and it is hard to give a biological justification for a given choice of the weights. Balancing multiple competing objectives can also be analysed using Pareto optimality [
39,
40] that delays making subjective choices; a solution is Pareto optimal if no objective can be improved without compromising another. Solving the Pareto optimisation problems involves identifying all Pareto-optimal solutions, which defines the Pareto front. A Pareto optimal solution is then subjectively selected, either based on the decision maker’s preferences or utilising additional factors that could include patient input. However, as with traditional objective weighting methods, solutions determined by Pareto optimisation depend on the optimisation parameters, [
41], and, therefore, depend on subjective choices. In the LEP formulation, all competing therapy objectives are inherently weighted by the event-probabilities (e.g., tumour elimination) that have direct biological interpretations; there is, thus, a single objective: the patient’s expected lifetime averaged over all therapy outcomes to be maximised. The LEP’s advantages are summarised in
Table 3.
Two life expectancy pay-off models were presented and analysed, where DLP can be considered a tolerable toxicity model and SAP a model of severe toxicity. Both models incorporate the event of tumour elimination, i.e., patient cure, and are illustrated on a one-compartment tumour model with a cell cycle targetting drug. In DLP, chemotherapy can reduce significantly the patient’s quality of life, and, thus, we proposed a discounted pay-off to model that poor quality of life during therapy, [
26]. In SAP, drug-induced death is modelled as a Poisson process with rates proportional to the amount of drug administered at any time point of the treatment. For both optimisation problems we proved that (1) optimal solutions are bang-bang, and (2) every solution that satisfies PMP has at most one switch and can only switch from continuous MTD to no drug at some time point
x. Since PMP is a necessary condition for optimality, finding the optimal solution reduces to determining the optimal switching time
x, where the control transitions from the MTD to no drug. This reduces the original optimal control problem to a one-dimensional NLP problem.
In both of these models, patients are either treatable or untreatable. In our analysis, treatability was dependent on tumour characteristics, such as size, and patient characteristics, such as their expected lifetime without treatment (
) and tolerability to drug toxicity,
Figure 5 and
Figure 12. More generally, we would expect that treatability would be dependent on tumour type and stage. A personalised LEP approach would, thus, allow this information to be incorporated and patient specific therapy designed. Provided the time horizon is large enough treatability is independent of the time horizon and treatment switches from no treatment to treat-and-stop.
Optimal solutions of both payoffs prioritise tumour elimination over low total drug toxicity. At the untreatable boundary the optimal drug administration period jumps from 0 days to a period large enough to achieve a large TEP (for the parameters used in this study, treatable patients have a
); this is because the potential gain in life expectancy, when the tumour is eliminated, is much greater than the life expectancy when treatment does not eliminate the tumour. In particular, in SAP treatable patients close to the untreatable boundary can have a cure probability smaller than
, with drug administration exceeding 20 days (
Figure 12). For the parameters used here, treatable patients under optimal treatment have an expected lifetime that spans from 10–83% of the life expectancy of a healthy individual
Figure 11B. This high variability depends primarily on the initial tumour size and the drug tolerance interval
, since patients that are more susceptible to the drug are more likely to die during therapy. In the DLP, the life expectancy of a treatable patient is nearly identical to that of a healthy individual—exceeding 99.4% of a healthy individual,
Figure 6B.
In general, SAEs significantly disrupt the course of treatment, potentially leading to an interruption of treatment, [
42], a permanent cessation of treatment with a shift to palliative care [
43] or TRM [
44]. We presented a SAP model based on TRM, but it can be modified to account for SAE that causes treatment to be ceased or modified. Since treatment is modified based on an observation (SAE event), this comprises an adaptive therapy. This SAE interuption of therapy then gives an iterative optimisation problem involving conditional subproblems with a SAE at time
, see
Section 5.2.
We illustrated the lifetime expectancy pay-off for chemotherapy optimisation of simple tumour growth models with only two events (cure, TRM). This has the advantage of avoiding the need to describe and justify detailed cancer models, thus, allowing the pay-off formulation to take centre stage. Further, the models are sufficiently simple to allow formal analysis, in particular we proved that optimal solutions are treat-and-stop. Other types of solutions, such as delayed treatment [
1,
20,
45] or administrating the highest drug dose at the end of the treatment period [
46,
47], have been found to be optimal in a number of theoretical studies, solutions that are clinically ill advised. So it is reassuring that our pay-off gives clinically suitable solutions; how robust this is for more realistic tumour growth models and additional events is unclear and needs to be examined.
The LEP framework is very flexible, and can be generalised in multiple ways, including incorporating other treatments, such as radiotherapy and surgery, more complex cancer growth models [
8,
48], including multi-compartment models, additional events such as mutation and metastasis, and more realistic drug mechanism models with more realistic application schedules (discrete injections) and pharmacokinetics/dynamics. To incorporate additional events, the event rates
need to be parametrised (with dependence on the tumour size and drug). However, because we classify outcomes by tumour elimination, we need to also allow for the time of elimination during therapy, since if eliminated the event rate changes to
. Thus, we need to have the joint event of tumour elimination and the event; this will be pursued elsewhere. These generalisations will likely pose optimisation problems where the PMP equations (the TPBVP) cannot be solved analytically, and, thus, bang-bang solutions are not assured. Numerical methods are then needed [
49,
50] that either produce solutions to PMP (so called indirect methods), or direct methods that approximate the control function using a linear combination of basis functions (e.g., Lagrange polynomials or piece-wise constant functions), giving a finite-dimensional non-linear programming (NLP) problem. There are efficient NLP-solver algorithms in the literature, many of which are open source [
51,
52].
The LEP framework has the potential for application in real-world cancer therapy, for instance, in personalised therapy, supporting clinical decision making by identifying how different patient groups might benefit most effectively from therapy, and determining the most effective treatment in fragile patients, thereby reducing overtreatment. AML for example is typically treated with intensive therapy, but older patients may be treated with less intense, less effective therapy [
53,
54,
55,
56]. Having a clinical diagnostic tool to ascertain best therapy based on patient health and other factors would potential transform geriatric AML treatment. However, although the LEP has a key advantage over standard cancer treatment cost functions in that it is objective and the parameters have direct interpretations, there remains the difficulty of determining those parameters from real world data. Three types of data are required. Most accessible is life expectancy data, such as from life tables that typically allow for multiple covariates, including race, socio-economic status [
57], and comorbidities [
58]. Survival (Kaplain–Meyer) curves can potentially be used to structure expected lifespan by cancer subtype, grade, and stage, although data to determine average lifespan in the absence of treatment is likely rare. Data on causes of deaths across different cancers is also available [
59]. Secondly, the rates of the possible events is needed, with dependency on tumour size and drug concentration; tumour size is known to be a prognostic factor in many cancers, playing a key role in cancer stage, although likely a prognostic factor alone in many cancers (hepatocellular carcinoma [
60], breast cancer [
61,
62], and NSCLC [
63,
64]). Finally, tumour growth and death rates are needed both in the presence and absence of the drug(s). Growth dynamics and drug efficacy is extremely complex for solid tumours, given the genetic and environmental heterogeneity of the tumour [
65], whilst the growth dynamics of leukaemias is better characterised [
10,
11,
66,
67,
68]. Developing predictive models for key events in cancer is a growing field; thus, as these improve, the LEP framework will be far more tractable.
The primary objective of this work was to develop a novel deterministic payoff functional that quantifies a patient’s expected lifetime under different therapy outcomes, incorporating the negative impact of excessive drug toxicity on life expectancy. As an initial step toward studying this objective, simple one-compartment tumour growth models were considered. Future extensions may incorporate detailed tumour models, drugs, and detailed pharmacokinetics/pharmacodynamics. In the case of blood cancers, multi-compartment models are usually employed to describe different levels of hierarchies in the blood maturation process in both cancer and healthy cell lineages. In the case of solid tumours spatial heterogeneity should be addressed, since cells that are in a hypoxic environment tend to be more resistant to the drug compared to non-hypoxic cells. Spatial heterogeneity is usually addressed by PDE models since spatio-temporal characteristics of the tumour should be considered. Our formulation should apply to all these extensions.
Table 3.
Therapy optimisation formulations.
Table 3.
Therapy optimisation formulations.
Context | Formulation | Optimisation Approach | Parameters | Post-Horizon Outcomes | Dependence on Horizon |
---|
Optimise drug therapy over dose, timings, and combinations. | Multiple competing objectives, defined by objective functions . | Minimise weighted sum , arbitrary weights [1,5,9]. | and subjectively chosen. | Not normally considered [5]. | Yes. |
| | Pareto optimisation [40]. | Subjective choice of , and subjective selection of best solution on Pareto front. | | |
Optimise drug therapy over dose, timings, and combinations. | Maximise patient lifespan. | Life expectancy pay-off (LEP). | Event rates and lifespan conditioned on events need to be determined from data. | Classified by events, for instance including cure, relapse, death. | Invariant provided therapy ended. |
Optimise over choice of defined treatments. | Maximise patient lifespan | Dynamic treatment regime models [24,69]. | Treatment outcome statistics required from data. | Various, including cure, relapse, death. | Not relevant |