Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy

Tzamarias, Byron D. E.; Ballesta, Annabelle; Burroughs, Nigel John

doi:10.3390/math13193080

Open AccessArticle

Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy

by

Byron D. E. Tzamarias

¹,

Annabelle Ballesta

²

and

Nigel John Burroughs

^3,*

¹

MathSys CDT, Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK

²

Cancer Systems Pharmacology, Institute Curie, 75248 Paris, Cedex 05, France

³

SBIDER, Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3080; https://doi.org/10.3390/math13193080

Submission received: 15 July 2025 / Revised: 5 September 2025 / Accepted: 10 September 2025 / Published: 25 September 2025

(This article belongs to the Section E3: Mathematical Biology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

We propose a life-expectancy pay-off function (LEP) for determining optimal cancer treatment within a control theory framework. The LEP averages life expectancy over all future outcomes, outcomes that are determined by key events during therapy such as tumour elimination (cure) and patient death (including treatment related mortality). We analyse this optimisation problem for tumours treated with chemotherapy using tumour growth models based on ordinary differential equations. To incorporate tumour elimination we draw on branching processes to compute the probability distribution of tumour population extinction. To demonstrate the approach, we apply the LEP framework to simplified one-compartment models of tumour growth that include three possible outcomes: cure, relapse, or death during treatment. Using Pontryagin’s maximum principle (PMP) we show that the best treatment strategies fall into three categories: (i) continuous treatment at the maximum tolerated dose (MTD), (ii) no treatment, or (iii) treat-and-stop therapy, where the drug is given at the MTD and then halted before the treatment (time) horizon. Optimal treatment strategies are independent of the time horizon unless the time horizon is too short to accommodate the most effective (treat-and-stop) therapy. For sufficiently long horizons, the optimal solution is either no treatment (when treatment yields no benefit) or treat-and-stop. Patients, thus, split into an untreatable class and a treatable class, with patient demographics, tumour size, tumour response, and drug toxicity determining whether a patient benefits from treatment. The LEP is in principle parametrisable from data, requiring estimation of the rates of each event and the associated life expectancy under that event. This makes the approach suitable for personalising cancer therapy based on tumour characteristics and patient-specific risk profiles.

Keywords:

cancer chemotherapy; optimal control; non autonomous dynamical systems; Pontryagin’s maximum principle

MSC:

34H05; 92-10; 92B99; 37N25

1. Introduction

Optimal therapy strategies for cancer patients can be formulated as a control theory problem, [1,2,3,4,5,6,7,8], to determine the optimal dose and schedule of chemotherapy, or optimal combination and timing of treatments to control tumour size and achieve the best outcome. Available treatments, the controls, include chemotherapy, resection surgery (tumour removal), radiotherapy, and transplant (blood cancers). For chemotherapy, the optimal concentration of the drug(s) at each point in time must be determined. Control theory formulations require specification of a model of cancer growth under therapy (incorporating the mechanism of action and efficacy of a drug), and a mathematical function (the objective) to be optimised, potentially a function of the tumour state, patient health and therapy strength/impact, both throughout the treatment interval and at the end of therapy. This control problem can be formulated in a deterministic framework (ODEs, PDEs), allowing Pontryagin’s Maximum principle (PMP) to be used [9], or as a stochastic control problem. Deterministic control theory frameworks have substantial advantages, being both analytically and numerically more tractable. There is also a far more extensive literature on deterministic models of cancer growth. Tumour growth is complex and notoriously difficult to model and parametrise, although with recent technological innovations there are an increasing number of validated chemotherapy cancer models of varying complexity [10,11,12,13].

The optimisation criterion is a fundamental part of the optimisation problem and has a significant impact on the nature of the optimal solutions [5,14]. Optimisation criteria are needed since there are competing objectives for therapy—the control of the tumour (for instance, minimising tumour size at the end of therapy) and minimising the risk of therapy to the patient. A chemotherapy control problem is typically set up as follows: treatment occurs up to a time T (the time horizon) using a time dependent control

u (t), t \in [0, T]

that is assumed bounded,

u (t) \in [0, u_{m a x}]

, where

u_{m a x}

is, for instance, the maximum tolerated dose (MTD) of a chemotherapy drug. The optimal dosing protocol,

u (t)

, is then determined by optimising an objective function, for instance,

J [N, u] = r N (T) + \int_{0}^{T} u {(t)}^{ν} d t,

(1)

comprising a terminal cost proportional to the tumour size

N (T)

at the end of the treatment horizon T and a running cost that accounts for factors such as drug toxicity during treatment. Here

N (t)

is the tumour size at time t. The positive constant r determines the relative weight of the two competing objectives, specifically balancing the need to minimise the tumour burden at the end of treatment and reducing toxicity to the patient. The

L^{2}

norm, with

ν = 2

, is more amenable to mathematical analysis, whilst the

L^{1}

norm is more biologically justifiable. A variety of other objectives have been used, but all typically have the same qualitative form and incorporate a weight to balance the competing objectives.

Such objectives do not have any medical justification or foundation, only capturing qualitatively the competing demands of treatment which limits their application to decision making in the clinic. Further, this formulation assumes a continuum of benefits and costs, whilst cancer therapy is punctuated by key events that substantially affect outcome and post-treatment prognosis. In particular, in these deterministic models the cancer regrows in the absence of therapy and, therefore, tumour elimination (cure) is ignored; these optimisation problems in fact restrict focus to the period

[0, T]

and ignore outcomes beyond T. There are also other events of key importance to patient prognosis, for instance treatment can have severe adverse effects (SAEs) on patient health, [15], drug resistant (mutant) cells can appear [16], and the cancer may undergo metastasis, i.e., spreading and colonising to other tissues and sites in the body [17]. SAEs can result in cessation of treatment or patient death in extreme cases, drug resistance limits possible therapies [18], whilst the prognosis for metastatic cancers is poor [19].

Here we propose a new objective function that incorporates these key events, Figure 1, including complete remission (cure), failure to eliminate the tumour, SAE, metastasis, and generation of mutants (both treatable and untreatable) during treatment. It utilises the rate of these events, so that at the end of therapy we can calculate the probability of each event having occurred, events that impact patient outcome and future prognosis. Thus, our formulation implicitly incorporates considerations post-treatment. To formulate an objective we observe that these events impact patient lifespan, thereby giving a common measure to quantify the impact of these events. We therefore propose to maximise the patient’s expected lifespan, averaged over possible events. This idea was proposed and explored in an earlier work using a simple approximation to the tumour elimination probability [20]. Consider a patient with a tumour of size

N (t)

, that is treated over a (horizon) time T, treatment administered as a control

u (t)

,

t \in [0, T]

that directly affects the tumour’s growth dynamic. For instance, this could model a chemotherapy drug administered at concentration

u (t)

. The outcome of therapy, and the expected lifespan of the patient, is determined by a set of countable events indexed by

j = 1, \dots, n

. We assume that events occur at time t with rate

μ_{j} (N, u; t)

, which depends on the tumour size

N (t)

, the control

u (t)

, and possibly explicitly on time. Events, thus, follow a Poisson process; the probability of event j occurring by time t is

p_{j} [N, u; t] = 1 - exp (- \int_{0}^{t} μ_{j} (N, u; s) d s)

, which is a function of the tumour history and control schedule. The rate of tumour elimination is dependent on fluctuations in the number of tumour cells; thus, we draw on branching processes to determine the probability of tumour elimination. The occurrence of an event may affect the rates of other events, thus, multiple event sequences need to be considered. Let k index all possible (allowable) sequences of events or futures (there are a maximum of

n!

, assuming an event happens only once); event combinations/futures are distributed as a multiple event Poisson process, giving the probability

p_{k} [N, u; t]

of event combination k occurring by time t. Under event combination k let the expected lifespan be

L_{k}

. Therefore, under control schedule u the expected lifetime, or payoff, is,

J_{0} [N, u; T] = \sum_{k} p_{k} [N, u; T] L_{k} .

(2)

More generally this could also incorporate an integration over futures parametrised by a continuum. The optimal control schedule

u^{*} (t), t \in [0, T]

, is then the schedule that maximises this expected lifetime, i.e.,

u^{*}

maximises (2) subject to the growth dynamics. The outcome at the horizon time can then be classified into which events occurred, event k occurring with probability

p_{k} [N, u; T]

. This objective has to satisfy a consistency constraint; if the optimal control is zero at the horizon time,

u^{*} (t) = 0, t \in [T - ϵ, T], ϵ > 0

, then the expected lifespan is invariant to a reduction of T as the therapy is unchanged—the expected lifetime only changes by application of the control. In particular, if

u^{*}

is zero at T the probability of tumour elimination must also be invariant to changes in T.

We develop this therapy optimisation framework within a deterministic context, which has substantial advantages over stochastic models in terms of model flexibility (with a range of nonlinear and multi-compartment ODE tumour growth models in the literature [1,2,3,4,5,6,7,8], and powerful optimisation tools such as PMP [1,21]. The competing objectives of therapy, tumour control/elimination, and minimising treatment related harm to the patient, are incorporated as impacts on the expected lifespan. We develop two payoff functionals. Firstly, we formulate a model where the drug impacts patient quality of life, giving the discounted expected lifespan payoff (DLP), Section 2, similar to the objective (1). Secondly, we propose a model where the negative effects of therapy are incorporated as SAEs. Specifically, we model treatment related mortality (TRM) events giving the severe adverse effects payoff (SAP), Section 5. For both of these payoffs, the optimal solutions separate patients into an untreatable class and a treatable class based on patient demographics, tumour size at detection, and the patient’s susceptibility to the drug’s side effects.

Maximising patient lifespan is not a new concept; it has been used as an optimisation criteria in a control theory analysis of patients with lethal/incurable forms of cancer. Originally analysed in [22] for the cases where resistant mutant cells pre-exist, it has been extended with a number of analyses since, including [23] which maximises the time to remain within a safe (or viability) region. Typically, these models incorporate tumour and normal cells, the safe region being defined by thresholds on both cell types. Maximising life expectancy is also similar to maximisation of overall survival (OS) used in sequentially adaptive medical decision-making where treatment is considered in multiple stages (drawn from a set of defined therapies), also known as dynamic treatment regime models, [24]; for instance, induction therapy and salvage therapy if induction fails. These problems are formulated in the Markov decision making framework. Such methods have been used to optimise sequential drug use, for instance, for acute myeloid leukaemia (AML) [25]. These models are based on probabilistic outcomes of events to determine optimal strategies and do not include explicit tumour dynamics, and are, thus, distinct from our approach.

This paper is organised as follows. In Section 2, we formulate the DLP. In Section 3, the functional form of the control probability of tumour lineage elimination is derived. In Section 4, PMP is applied to determine the optimal solutions of the DLP, subject to non-linear tumour evolution models. In Section 4, optimal drug administration strategies of the DLP subject to a logistic tumour evolution model are presented. In Section 5 the SAP is introduced and optimal drug administration strategies are determined. In Section 6, we discuss our framework and its limitations, and provide future directions.

2. The Discounted Life Expectancy Pay-Off (DLP)

The cost of treatment to the patient needs to be incorporated into the objective

J_{0}

, (2); for instance, the impact of drug toxicity on the patient. The simplest toxicity model is to deduct from the payoff (2) a cost of treatment measured in days,

R [N, u; T] = γ \int_{0}^{T} u (t) d t

, assuming toxicity is linear in the drug concentration. This simple model of toxicity is often used in control theory analysis of tumours [1,9]. In the lifetime context, this can be interpreted as a poor quality of life during treatment, e.g., nausea caused by the drug. Duration of treatment is in fact a significant covariate of a patient’s quality of life [26]. Assuming expected lifetimes are linear in age, the expected lifetime of outcome k at time T in the future is

L_{k} (T) = L_{k} - T

. We then obtain the payoff,

J [N, u; T] = \sum_{k} p_{k} [N, u; T] L_{k} (T) + T - R [N, u; T],

(3)

where both the payoff and the cost

R [N, u; T]

are quantified in days. Then,

T - R [N, u; T]

is a measure of the quality of life during treatment, effectively the days during treatment ‘worth living’, which can possibly be determined from quality-of-life (QoL) aggregate measures derived from patient-reported outcomes [26]. If drug concentration is rescaled to the maximum tolerated dose (MTD), i.e.,

u (t) \in [0, 1]

, then this interpretation imposes

γ \leq 1

, and is expected to be of order 1.

Consider two possible outcomes—cure (tumour elimination), and a failure to eliminate the tumour, so the patient will subsequently relapse. Define

p [N, u; T]

as the probability that the treatment achieves tumour elimination within time T, a functional of the tumour history. Define the expected lifetimes post-treatment,

L_{e} (T), L_{n} (T)

for tumour elimination and no elimination. The expected lifetime

L_{n} (T)

could be parametrised by tumour size,

L_{n} (T, N (T))

if dependence is known. The pay-off function then reads,

J [N, u; T] = L_{e} (T) p [N, u; T] + L_{n} (T) (1 - p [N, u; T]) + T - \int_{0}^{T} γ u (t) d t

assuming a linear drug toxicity effect. Rearranging, we get

\begin{matrix} J [N, u; T] & = & (L_{e} (T) - L_{n} (T)) p [N, u; T] + L_{n} (T) + T - \int_{0}^{T} γ u (t) d t \\ = & (L_{e} - L_{n}) p [N, u; T] + L_{n} - \int_{0}^{T} γ u (t) d t, \end{matrix}

(4)

the second line following under the assumption of linearity of expected lifetime estimates over the period T. Here we are assuming that the treatment horizon is shorter than the lifetime

L_{n}

, otherwise,

L_{n} (T)

is negative and death during treatment would need to be separated out as a possible outcome, see Section 5.

3. Computation of the Tumour Elimination Probability

Here, we draw on branching processes to determine the probability of tumour elimination for exponentially growing tumours, and then generalise to non-linear dynamics. For the case of a tumour capacity, which is typical for non-linear dynamics, the probability of die-out is 1 as

t \to \infty

, i.e., given sufficient time there will be a sufficiently large fluctuation that will eliminate the population of tumour cells. To handle this, and determine the treatment related tumour extinction, we utilise the conditional probability of tumour elimination by therapy.

3.1. Probability of Tumour Elimination: Exponential Growth

To compute the probability of tumour elimination we consider a branching process (BP) with

N (t)

independent cells and time dependent birth and death rates,

a (t), b (t)

, reflecting the effect of treatment through time; in the absence of a drug, we assume the birth and death rates are constant,

a_{0}

,

b_{0}

, respectively, with

a_{0} > b_{0}

. The tumour control probability (TCP) is the probability of tumour cell elimination,

P (N (T) = 0 | N (0) = M)

, that can be solved using the method of characteristics, see Appendix A [27]. The TCP is given by

TCP (t) = {(1 - {(Λ (t) + \int_{0}^{t} a (t^{'}) Λ (t^{'}) d t^{'})}^{- 1})}^{M}

(5)

where

Λ (t)

satisfies the ODE,

\frac{d Λ}{d t} = (b (t) - a (t)) Λ, with solution Λ (t) = exp \int_{0}^{t} (b (t^{'}) - a (t^{'})) d t^{'},

(6)

satisfying

Λ (0) = 1

.

The mean tumour size,

N (t) = E [Z (t)]

, satisfies the ODE,

\frac{d N}{d t} = (a (t) - b (t)) N, with N (0) = M,

(7)

i.e., tumour growth is linear in tumour size, and tumour size is exponential for constant u. We have

\frac{d}{d t} (N Λ) = 0

, giving

Λ (t) = N (0) / N (t)

. Thus, we have the TCP expression,

TCP (t) = {(1 - {(N (0) G (t))}^{- 1})}^{N (0)} with G (t) = \frac{Λ (t)}{N (0)} + \int_{0}^{t} a (t^{'}) \frac{Λ (t^{'})}{N (0)} d t^{'} = \frac{1}{N (t)} + \int_{0}^{t} \frac{a (t^{'})}{N (t^{'})} d t^{'} .

(8)

For large

N (0)

, and assuming

N (t) < N (0)

throughout the horizon, this is well approximated by

TCP (t) = exp (- G {(t)}^{- 1}),

(9)

which is valid provided

N (0)

is sufficiently large (always expected to be the case). The condition

N (t) < N (0)

can in fact be ignored, since the kernel in

G (t)

is negligible when this does not hold. Therefore, we use this approximation for all histories. This approximation simplifies the LEP and, thus, the following optimisation analysis; however, the LEP can be formulated without using this approximation, i.e., using the TCP in Equation (8), giving a more complex expression for the objective. This would allow the LEP framework to be used with a small initial tumour size. Observe that explicit dependence on

N (0)

has cancelled, so the initial condition is irrelevant, provided it is large enough. Observe that at

t = 0

, this expression is not zero, although its value,

e^{- N_{0}}

, is exponentially small in

N_{0}

.

We note that for the BP,

\frac{d G}{d t} = b (t) \frac{Λ (t)}{N (0)} \geq 0

(10)

since

b (t)

, the death rate, is non-negative. Both

{(1 - x^{- 1})}^{M}

and

e^{- x^{- 1}}

are monotonically increasing with x and, thus, the TCP does not decrease in time, as expected, with die-out event probabilities accumulating over time. Thus, even if the deterministic solution for the mean tumour size

N (t)

increases, (7), for instance, after cessation of treatment, the die-out probability from any interval where die-out was likely is retained in the TCP. We also note that post-treatment, when

b (t) = b_{0}

(the drug free death rate),

G (t)

continues to increase if

b_{0} > 0

. This is because in the branching process lineages may die out by chance. Thus, treatment up to T may not eliminate the tumour by T, but lead to elimination post treatment, e.g., if the tumour burden at

t = T

is only a few cells, these may die out. This die-out after the treatment horizon needs to be incorporated, as tumour elimination during or after treatment gives the same expected lifetime. Thus, we want to compute the probability of eliminating the tumour at

t = \infty

by treatment within the interval of time

[0, T]

; specifically, only lineages that survive to

t \to \infty

contribute to the failure to eliminate the tumour with non-negligible probability. Thus, we compute

G (t \to \infty)

to give the probability of tumour elimination,

p [N, u; T] = exp - G {(t \to \infty)}^{- 1} .

Of course,

G (t \to \infty)

depends on the drug concentration during time interval

[0, T]

. After the horizon time T, we have

N (T + s) = N (T) e^{λ_{0} s}

for the mean of the BP, giving

p [N, u; T] = lim_{s \to \infty} exp - (\frac{N (T) e^{λ_{0} s}}{1 + N (T) e^{λ_{0} s} (W (T) + \int_{T}^{T + s} \frac{1}{N (t)} a_{0} d t)}) with W (T) = \int_{0}^{T} \frac{a (t)}{N (t)} d t,

where the birth and death rates are

a_{0}, b_{0}

in the absence of drugs, with a net growth rate

λ_{0} = a_{0} - b_{0}

. This gives

p [N, u; T] = lim_{s \to \infty} exp - (\frac{N (T) e^{λ_{0} s}}{1 + N (T) e^{λ_{0} s} (W (T) + \frac{1}{N (T)} a_{0} \int_{0}^{s} e^{- λ_{0} s^{'}} d s^{'})})

This is the tumour lineage elimination probability

by treatment within time T,

p [N, u; T] = exp - (\frac{N (T)}{\frac{a_{0}}{λ_{0}} + N (T) W (T)}) .

(11)

This is equivalent to modifying the expression for G, (8), (9),

p [N, u; T] = exp - G {[N, u; T]}^{- 1}, G [N, u; T] = \frac{a_{0}}{λ_{0} N (T)} + \int_{0}^{T} \frac{a (s)}{N (s)} d s .

(12)

where we explicitly incorporate dependence of the birth and death rates on the drug concentration. The mechanism of drug action will determine how the drug affects the birth and death rates.

By definition, the tumour lineage elimination probability

p [N, u; T]

is independent of T if drug infusion has stopped by T, i.e.,

\frac{d G}{d T} |_{u (T) = 0} = 0,

(13)

which is also obvious by differentiation,

\frac{d G}{d T} = - \frac{a_{0}}{λ_{0}} \frac{1}{N {(T)}^{2}} \frac{d N}{d t} |_{t = T} + \frac{a (T)}{N (T)}

, the result following if

u (T) = 0

since

\frac{d N}{d t} = λ_{0} N

and

a (T) = a_{0}

. Thus, the elimination probability is independent of the horizon time T once drug infusion has ended and, therefore, the expected lifetime

J [N, u; T]

is also independent of T. This is an important consistency condition of the pay-off—if drug infusion stops before the end of the horizon, the expected lifespan should be independent of T.

Therefore, the pay-off function for the exponential growth model with two outcomes is given by

J [N, u; T] = (L_{e} - L_{n}) e^{- G {[N, u; T]}^{- 1}} + L_{n} - \int_{0}^{T} γ u (t) d t,

and

G [N, u; T]

from (12), where we emphasise that it is a function of the tumour history

N (t)

and drug schedule

u (t)

.

3.2. Probability of Tumour Elimination: Nonlinear Growth Dynamics

Observe that our derivation of the TCP (8) allows for generic time dependence and, therefore, also applies to the case of non-linear growth dynamics. Consider a tumour with deterministic non-linear growth dynamics,

\frac{d N}{d t} = f (N, u) N = (f_{b} (N, u) - f_{d} (N, u)) N,

(14)

where the birth and death rates,

f_{b} (N, u)

,

f_{d} (N, u)

, respectively, are determined from a mechanistic model, for example, a cell cycle model [1,4]. In general, both rates can be drug and tumour size dependent. The TCP then follows using

a (t) = f_{b} (N (t), u (t))

in (8), to obtain

TCP (t) = exp (- G {(t)}^{- 1}), G (t) = \frac{1}{N (t)} + \int_{0}^{t} \frac{f_{b} (N (t^{'}), u (t^{'}))}{N (t^{'})} d t^{'} .

(15)

The derivation of the tumour lineage elimination probability, however, needs modifying. Non-linear dynamics will typically give rise to a stable capacity K (in the absence of drugs), satisfying

f (K, 0) = 0

. Then, the probability of tumour elimination as

t \to \infty

is 1, since, eventually, there will be a sufficiently large fluctuation. We are not interested in elimination over such long time scales. Thus, to remove these events we modify the tumour dynamics such that

N \to \infty

as

t \to \infty

, but satisfies the non-linear dynamics (14) on the treatment time scale, i.e., for

t < T

. This can be implemented in a number of ways. Here, we change the dynamic model; specifically, we use the dynamics

\frac{d N}{d t} = f (N, u) N, N < ζ K, \frac{d N}{d t} = (a (u) - b (u)) N, N > ζ K

(16)

i.e., we use the birth, death rates of the BP,

f_{b} (N, u) = a (u), f_{d} (N, u) = b (u)

when

N > ζ K

. The threshold

ζ \in (0, 1)

is chosen such that

N (t) < ζ K \forall t \in [0, T^{'}]

in the absence of drugs, for arbitrary

T^{'} > T

. Thus, the dynamics only switches to a linear regime after any possible treatment has ended. Therefore,

u = 0

for

N > ζ K

and

N (t) \sim e^{λ_{0} t} \to \infty

as

t \to \infty

. The tumour lineage elimination probability for non-linear dynamics then follows using a similar calculation as in the case of exponential growth dynamics,

p [N, u; T] = lim_{s \to \infty} exp - (\frac{N (T + s)}{1 + N (T + s) (W (T) + \int_{0}^{s} \frac{a (T + s^{'})}{N (T + s)} d s^{'})}) = exp - {(W (T) + \int_{0}^{\infty} \frac{a (T + s^{'})}{N (T + s^{'})} d s^{'})}^{- 1},

where

W (T) = \int_{0}^{T} \frac{a (t)}{N (t)} d t

, as before. Since

a (t) = f_{b} (N, u)

, we have the expression

G [N, u; T] = W (T) + \int_{T}^{\infty} \frac{f_{b} (N (t), 0)}{N (t)} d t .

(17)

Since

u = 0

\forall t > T

, N is monotonically increasing and, therefore, at some time

t_{*} > T

we have

N (t_{*}) = ζ K

, and for

t > t_{*}

growth is exponential. We split the last integral in (17) at

t^{*}

, giving

\begin{matrix} \int_{T}^{\infty} \frac{f_{b} (N (t), 0)}{N (t)} d t & = & \int_{T}^{t^{*}} \frac{f_{b} (N (t), 0)}{N (t)} d t + \int_{t^{*}}^{\infty} \frac{f_{b} (N (t), 0)}{N (t)} d t \\ = & \int_{T}^{t^{*}} \frac{f_{b} (N, 0)}{f (N, 0)} \frac{1}{N^{2}} \frac{d N}{d t} d t + \frac{a_{0}}{ζ K} \int_{0}^{\infty} e^{- λ_{0} t} d t, \\ = & \int_{N (T)}^{ζ K} \frac{f_{b} (N, 0)}{f (N, 0)} \frac{1}{N^{2}} d N + \frac{a_{0}}{λ_{0}} \frac{1}{ζ K} . \end{matrix}

(18)

Thus, for non-linear dynamics (14) we have

G [N, u; T] = W (T) + \int_{N (T)}^{ζ K} \frac{f_{b} (N, 0)}{f (N, 0)} \frac{1}{N^{2}} d N + \frac{a_{0}}{λ_{0}} \frac{1}{ζ K},

(19)

with

\frac{d W}{d t} = \frac{f_{b} (N, u)}{N}, W (0) = 0 .

(20)

The expression (19) is analytically tractable for some growth models such as the logistic model.

Expression (19) gives a finite

G [N, u; T]

and, thus, the tumour lineage elimination probability

p [N, u; T] = e^{- G {[N, u; T]}^{- 1}} < 1

. It is independent of T by construction (since T is arbitrary), and, therefore, satisfies (13). This is also followed

by direct differentiation,

\frac{d G}{d T} = \frac{f_{b} (N (T), u (T))}{N (T)} - \frac{f_{b} (N, 0)}{f (N, 0)} \frac{1}{N^{2}} \frac{d N}{d t} |_{t = T}

which is zero if

u (T) = 0

.

If the non-linear dynamic model does not have a mechanistic basis for the birth and death rates, then there is a choice that simplifies these expressions. We note that only birth and death rates at low numbers of cells is actually important, since the kernel in (8) is negligible otherwise; the probability of lineage tumour die-out is dominated by the periods when the tumour is small, see Appendix B. We can set the birth rate to

f_{b} (N, u) = a (u) \frac{f (N, 0)}{f (0, 0)},

(21)

which gives the correct birth rate

a (u)

as

N \to 0

. Therefore, this is also a suitable general approximation. Substituting in (19), we obtain

G [N, u; T] = W (T) + \frac{a_{0}}{f (0, 0)} \int_{N (T)}^{ζ K} \frac{1}{N^{2}} d N + \frac{a_{0}}{λ_{0}} \frac{1}{ζ K} = W (T) + \frac{a_{0}}{λ_{0}} \frac{1}{N (T)},

(22)

since

f (0, 0) = a_{0} - b_{0} = λ_{0}

. Dependence on

ζ

in fact cancels, giving an identical expression as the exponential growth model. The difference to (19) is in fact negligible because the rate of elimination (extinction) is negligible unless N is small, when the kernels agree. Thus, this expression is generally applicable.

3.3. The Treatment Dependent Elimination Probability (TEP)

Consider the case when no drug is applied over the horizon T then

G [N, u = 0; T] > 0

in (12) and (22) and, thus, the probability of tumour elimination is non-zero despite no treatment. This arises because stochastic fluctuations can lead to spontaneous tumour elimination in the model, even if negligibly small. For the untreated branching process, for any T, we have

G [N, u = 0; T] = a_{0} / (λ_{0} N_{0})

as

\frac{d G}{d T} = 0

, so the lineage elimination probability is

e^{- \frac{λ_{0} N_{0}}{a_{0}}}

. This leads to an interpretation issue since under no treatment the expected lifespan is

(L_{e} - L_{n}) e^{- \frac{λ N_{0}}{a_{0}}} + L_{n}

and not

L_{n}

as originally defined. Thus,

L_{n}

is the expected lifespan in the absence of treatment conditional on the tumour not naturally dying out. In practice, because

N_{0}

is large in absence of treatment, the difference is negligible.

To account for this natural die-out we define the treatment dependent elimination probability (TEP) as the conditional probability of tumour lineage elimination,

p [N, u; T] = \frac{e^{- G {[N, u; T]}^{- 1}} - e^{- G {[N, u = 0; T]}^{- 1}}}{1 - e^{- G {[N, u = 0; T]}^{- 1}}},

(23)

with

G [N, u; T]

given by (12). This is the elimination probability conditioned on the fact that the tumour would not spontaneously resolve if left untreated and, therefore, the elimination probability due to (drug) treatment alone. We, thus, have

p [N, u = 0; T] = 0

in the absence of treatment (

u (t) = 0 \forall t \in [0, T]

), and by definition the lifetime payoff is

J [N, u = 0; T] = L_{n}

. Then,

L_{n}, L_{e}

are the patient lifetimes under no treatment and the expected lifetime if the tumour was successfully (instantly) removed at

t = 0

, equivalent to the expected lifetime for people without a tumour. This is simply a linear reparametrisation of the objective functional.

4. Optimal Therapy Solutions of the Discounted Expected Lifespan Payoff with Cure or Relapse Outcomes

We can now formulate the optimisation problem. We want to determine the optimal chemotherapy protocol by maximising the discounted expected lifespan pay-off (DLP) (using the TEP probability (23) with (22)),

J [N, u; T] = (L_{e} - L_{n}) \frac{e^{- {(\frac{a_{0}}{λ_{0} N (T)} + W (T))}^{- 1}} - p_{0}}{1 - p_{0}} + L_{n} - γ \int_{0}^{T} u (t) d t,

(24)

which incorporates two outcomes, cure versus a failure, to eliminate the tumour. Here,

p_{0} = e^{- G {[N, u = 0; T]}^{- 1}}

, the spontaneous cure probability, with

G [N, u; T] = \frac{a_{0}}{λ_{0} N (T)} + W (T)

, (22). We use the approximate general form, (21),

\frac{d W}{d t} = \frac{a_{0}}{λ_{0}} \frac{f (N, 0)}{N}, with (0) = 0 .

(25)

We want to determine the optimal drug regimen that maximises this pay-off.

We consider a generic one compartment tumour growth model, (14), with a linear dependence of the death rate on drug concentration u (the control), specifically,

\frac{d N}{d t} = f (N, u) N = (f_{b} (N) - f_{d 0} (N) - f_{d 1} (N) u) N, N (0) = N_{0} .

(26)

Here, the drug is assumed to not affect the birth rate, i.e.,

a_{0} (u) = a_{0}

. We define

f_{0} (N) = f (N, 0) = f_{b} (N) - f_{d 0} (N)

for convenience. The tumour capacity K in the absence of drugs, assuming large (10⁸–10¹⁰ cells), satisfies

f_{0} (K, 0) = 0

, i.e.,

f_{b} (K) = f_{d 0} (K)

. We assume

N f_{0}

is concave on

[0, K]

. We assume

f_{b}, f_{d 0}, f_{d 1} \geq 0

, and

f_{b}^{'} < min (0, f_{d 0}^{'})

, so the net growth rate per cell

f_{0}

decreases with N,

f_{0}^{'} \leq 0

. Here, prime denotes differentiation with respect to N. The drug efficacy likely falls with population size; we assume

f_{d 1}

is not increasing (so monotonically decreases, but not necessarily strictly),

f_{d 1} {(N)}^{'} \leq 0

. For tumour control at the MTD (

u = 1

) we need

f_{d 1} > f_{0}

,

\forall N \leq K

. For small

N (t)

,

N (t)

is the mean of a branching process with birth, death rates

a_{0} = f_{b} (0), b_{0} = f_{d 0} (0)

,

λ_{0} = a_{0} - b_{0}

in the absence of drugs.

Our optimisation problem is to determine the optimal drug concentration at each point in time,

u (t), t \in [0, T]

, that maximises the LEP. PMP reformulates this infinite-dimensional optimisation problem into a finite-dimensional two-point boundary value problem (TPBVP), thus, offering both analytical and numerical methods of solutions. Typically, there are two types of optimal solutions for bounded controls [28,29]: (i) bang-bang solutions, where the control function can only take values on the boundary of the allowed control range (i.e.,

u = 0

and

u = 1

), switching abruptly between these extremes; (ii) singular solutions, where the control function takes intermediate values within the allowed range, allowing the administered drug dose to vary continuously over time. Singular and bang-bang solutions can be combined. PMP is a necessary but not sufficient condition for optimality. Therefore, further analysis may be required to confirm that PMP solutions are in fact optimal [30]. By using PMP, the optimal solutions are given as follows:

Theorem 1.

For sufficiently large K, the optimal solutions of (26) that maximise objective (24) are bang-bang with at most 1 switch from

u = 1

to

u = 0

, so either no treatment, treat-and-stop, or the MTD. For sufficiently small horizon time T, solutions are no treatment.

Proof.

We use PMP which gives sufficient conditions for an optimal solution, [1,21]. The Hamiltonian is given by:

H = p_{N} f (N, u) N + p_{W} \frac{a_{0}}{N} \frac{f_{0} (N)}{λ_{0}} - γ u,

where

p_{N}, p_{W}

are the costates of N and W, respectively, and we have introduced

f_{0} (N) = f (N, 0) = f_{b} (N) - f_{d 0} (N)

. The costate dynamics are given by:

\begin{matrix} \frac{d p_{N}}{d t} = - p_{N} {(N f)}^{'} - p_{W} \frac{a_{0}}{λ_{0}} {(\frac{f_{0}}{N})}^{'} \end{matrix}

(27)

\begin{matrix} \frac{d p_{W}}{d t} = 0, \end{matrix}

(28)

suspending explicit dependence of functions on N for simplicity. The transversality conditions are given by:

\begin{matrix} p_{N} (T) = \frac{L_{e} - L_{n}}{1 - p_{0}} \frac{- \frac{a_{0}}{λ_{0} N {(T)}^{2}}}{{(\frac{a_{0}}{λ_{0} N (T)} + W (T))}^{2}} e^{- {(\frac{a_{0}}{λ_{0} N (T)} + W (T))}^{- 1}} \\ p_{W} (T) = \frac{L_{e} - L_{n}}{1 - p_{0}} \frac{1}{{(\frac{a_{0}}{λ_{0} N (T)} + W (T))}^{2}} e^{- {(\frac{a_{0}}{λ_{0} N (T)} + W (T))}^{- 1}} \end{matrix}

(29)

Thus,

p_{W} (t) = p_{W} (T) > 0

for all time and

p_{N} (T) = - \frac{a_{0}}{λ_{0} N {(T)}^{2}} p_{W} < 0

. As a function of

G (T)

, we have

p_{W} = \frac{L_{e} - L_{n}}{1 - p_{0}} \frac{1}{G {(T)}^{2}} e^{- G {(T)}^{- 1}}

; thus,

p_{W}

has a maximum at

G^{- 1} = 2

, so

p_{W} \leq 4 e^{- 2} \frac{L_{e} - L_{n}}{1 - p_{0}}

.

We have,

\frac{d p_{N}}{d t} |_{p_{N} = 0} = - p_{W} \frac{a_{0}}{λ_{0}} {(\frac{f_{0}}{N})}^{'} > 0

since

f_{0}^{'} < 0

. Thus,

p_{N} (t) < 0

for all

t \in [0, T]

since it terminates with

p_{N} (T) < 0

. However, we can improve on this bound. We define

s = \frac{p_{N}}{p_{W}} + \frac{a_{0}}{λ_{0} N^{2}},

then

s (T) = 0

, and

\frac{d s}{d t} = \frac{1}{p_{W}} \frac{d p_{N}}{d t} - \frac{2 a_{0}}{λ_{0}} \frac{1}{N^{3}} \frac{d N}{d t} = - \frac{p_{N}}{p_{W}} {(N f)}^{'} - \frac{a_{0}}{λ_{0}} {(\frac{f_{0}}{N})}^{'} - \frac{2 a_{0}}{λ_{0}} \frac{f}{N^{2}} .

Expressing this in terms of s, we obtain

\begin{matrix} \frac{d s}{d t} & = & - s {(N f)}^{'} + \frac{a_{0}}{λ_{0}} (\frac{f^{'}}{N} - \frac{f}{N^{2}} - {(\frac{f_{0}}{N})}^{'}) = - s {(N f)}^{'} + \frac{a_{0}}{λ_{0}} ({(\frac{f}{N})}^{'} - {(\frac{f_{0}}{N})}^{'}) \\ = & - s {(N f)}^{'} - \frac{a_{0} u}{λ_{0}} ({(\frac{f_{d 1}}{N})}^{'}) . \end{matrix}

(30)

Hence,

\frac{d s}{d t} |_{s = 0} = - \frac{a_{0} u}{λ_{0}} ({(\frac{f_{d 1}}{N})}^{'}) > 0

provided

N f_{d 1}^{'} (N) < f_{d 1} (N)

and

u > 0

. If

u = 0

, then

s = 0

is an s-nullcline. Therefore,

s \leq 0

for

t \in [0, T]

and, thus,

p_{N} (t) \leq - \frac{a_{0} p_{W}}{λ N {(t)}^{2}}

, for

t \in [0, T]

.

Since the Hamiltonian is linear with respect to the control u there exists a switching function:

Φ = - p_{N} f_{d 1} N - γ \equiv - (p_{W} f_{d 1} N) s + \frac{a_{0}}{λ_{0}} \frac{p_{W} f_{d 1}}{N} - γ .

(31)

Since the

N, s

dynamics decouple from those of

W, p_{W}

, we can consider dynamics in the

(N, s)

phase plane, with

p_{W}

fixed. The switching curve separates the phase plane into two regions, Figure 2, effectively stitching two dynamical systems with

u = 0, Φ < 0

and

u = 1, Φ > 0

; there may be dynamics on

Φ = 0

if there are trajectories on this curve, i.e., singular solutions with

\frac{d Φ}{d t} = 0

and

Φ = 0

. The flow and optimal trajectories can be determined as follows. We define three curves:

The switching line $Φ = 0$ is given by

$s_{Φ} = \frac{a_{0}}{λ_{0} N^{2}} - \frac{γ}{p_{W} f_{d 1} N} .$

(32)

Since $f_{d 1} (0) > 0$ and finite for $N \in [0, K]$ , we have $s_{Φ} \to \infty$ as $N \to 0$ . There is a unique zero, $N_{Φ}$ such that $s_{Φ} (N_{Φ}) = 0$ , i.e., it satisfies $N = \frac{a_{0}}{λ_{0} γ} p_{W} f_{d 1} (N)$ . Since K is assumed large, and $p_{W} \leq 4 e^{- 2} \frac{L_{e} - L_{n}}{1 - p_{0}}$ , we have that $N_{Φ} ≪ K$ . It is unique since $f_{d 1}$ is monotonically decreasing, $f_{d 1} (N) \leq 0$ . For $N > N_{Φ}$ , we have $s_{Φ} (N) < 0$ . Since $f_{d 1} (N) = f_{d 1} (0) + O (\frac{N}{K})$ , we have for $N ≪ K$ that

$s_{Φ} = \frac{a_{0}}{λ_{0} N^{2}} - \frac{γ}{p_{W} f_{d 1} (0) N},$

(33)

so there is a local minimum at $N_{m i n P h i} = \frac{2 a_{0} p_{W} f_{d 1} (0)}{λ_{0} γ} + O (K^{- 1})$ . We have $f_{d 1} (N) > 0$ for $N \in [0, K)$ ; the drug induced death rate may have a zero, $f_{d 1} (N) = 0$ for some $N \geq K$ , so $Φ$ may also have a local maximum in $[N_{Φ}, K]$ .
The s-nullcline, $s_{n c}$ .
-
In the region $u = 0$ , we have, using (30), that $\frac{d s}{d t} = - s {(N f_{0})}^{'}$ . Since $N f_{0}$ is concave on $[0, K]$ , and zero at $N = 0, K$ , there exists a unique turning point, $N_{T P}$ . The s-nullclines are, thus, $s = 0$ and $N = N_{T P}$ where $\frac{d (N f_{0})}{d N} = 0$ at $N_{T P}$ .
-
In the region $u = 1$ , we have ( ${(\frac{f_{d 1}}{N})}^{'} < 0$ , on interval $[0, K]$ ),

$s_{n c} = - \frac{a_{0}}{λ_{0} {(N f)}^{'}} {(\frac{f_{d 1}}{N})}^{'}$

which has a pole at $N = 0$ . Thus, this nullcline exists in $u = 1$ region where ${(N f)}^{'} < 0$ . If ${(N f)}^{'} = 0$ in $[0, K]$ , i.e., the population decay under the MTD slows down with N, there may be a second pole.
The level sets $H (N, s) = h$ . We have the following expressions for the Hamiltonian, which is a constant on optimal solutions (PMP).

$H = p_{W} s f N + (p_{W} \frac{a_{0}}{λ_{0}} \frac{f_{d 1}}{N} - γ) u \equiv p_{W} s f_{0} N + u Φ (N, s)$

(34)

since $Φ = \frac{\partial H}{\partial u}$ . Thus, a trajectory on $H = h$ is given by

$s_{H} = \frac{(h + γ - p_{W} \frac{a_{0}}{λ_{0}} \frac{f_{d 1}}{N})}{p_{W} f N} .$

The N-nullclines are

N = 0, N = K

if

u = 0

and

N = 0

if

u = 1

. There is, thus, a fixed point

(K, 0)

in the

u = 0

region. Clearly, since

\frac{d s}{d t} < 0

on

s = 0

if

N > N_{T P}

, the FP is a saddle. There is no fixed point in the

u = 1

region, as

\frac{d s}{d t} \to \infty

as

N \to 0

.

Optimal solutions satisfy PMP and, thus, must be a trajectory in our phase space. They start on the line

N = N_{0}

, and terminate on

s = 0

. Since optimal trajectories terminate on

s = 0

in finite time, trajectories can only reach

s = 0

with

u = 1

, i.e., they intersect the N-axis with

N \in (0, N_{Φ}]

, when

u = 0

,

s = 0

is a nullcline and cannot be reached in finite time. In fact the flow suggests trajectories leave the

u = 0

region by crossing the switching curve

Φ = 0

. Optimal solutions are, thus, trajectories that reach

s = 0

and terminate with

u = 1

. We define the trajectory

Γ

that passes through

(N_{Φ}, 0)

, i.e., reaches the intersection of

s = 0

and

Φ = 0

. We prove that

Γ

separates the phase space into trajectories, only trajectories below

Γ

can terminate.

First,

Γ

approaches

(N_{Φ}, 0)

from the

u = 1

region. Consider the trajectories in

u = 1

(

Φ > 0

) when they cross

s = 0

. We have, using (31),

\frac{d Φ}{d t} = \frac{a_{0}}{λ_{0}} p_{W} {(\frac{f_{d 1}}{N})}^{'} \frac{d N}{d t} - p_{W} f_{d 1} N \frac{d s}{d t} - s p_{W} {(f_{d 1} N)}^{'} \frac{d N}{d t} .

Using the dynamics (26), (30), we, thus, have, setting

s = 0

,

\frac{d Φ}{d t} |_{s = 0} = \frac{a_{0}}{λ_{0}} p_{W} {(\frac{f_{d 1}}{N})}^{'} (f N + f_{d 1} N) \equiv \frac{a_{0}}{λ_{0}} p_{W} {(\frac{f_{d 1}}{N})}^{'} f_{0} N .

This is strictly

< 0

for

N \in (0, N_{Φ}]

. Thus, the level set

H = 0

reaches

(N_{Φ}, 0)

for

u = 1

with

s < 0

.

Consider the expression for the Hamiltonian,

H = p_{W} s f_{0} N + u Φ (N, s)

, then the trajectory

Γ

that intersects

Φ = 0

at

s = 0

lies in the level set

H = 0

. We, therefore, have on this trajectory in the

u = 1

,

s < 0

region, (using

Φ = H - p_{W} s f_{0} N

),

Φ = - p_{W} s f_{0} N > 0 f o r N \in (N_{Φ}, K) .

At the extremes,

N = N_{Φ}

and

N = K

, we have

Φ = 0

, i.e., it intersects the switching curve twice, and only twice. Since optimal trajectories must terminate by crossing

s = 0

in the interval

N \in (0, N_{Φ}]

, the only optimal trajectories, besides

Γ

, are those that lie below

Γ

, or are already on

s = 0

.

The trajectory

Γ

in fact defines a bang-bang solution with a single switch from

u = 1

to

u = 0

. The optimal solution continues in the

u = 0

region on

s = 0

and can have arbitrarily time on

s = 0

. These are treat-and-stop solutions, and can be constructed to have arbitrary high horizon times T. The fact that

H = 0

is related to the invariance of the payoff J to extending the horizon time T. Trajectories below

Γ

terminate in

s = 0

, i.e., are MTD solutions; these solutions have a longer horizon time since

N (T) < N_{Φ}

. Finally, there is a branch of no treatment solutions with

u = 0

for all time that start on, and stay on

s = 0

. This dynamic on

s = 0

is related to the invariance of the objective to T once drug administration ends.

The case with

b_{0} = 0

is a special case (

a_{0} = λ_{0}

, so no lineage die-out). In this case,

Φ = 0

can have a pole at

N = K

, for instance if

f_{d 1} = - 2 d f_{0}

for a cell cycle drug. The proof still holds by considering only the open region

(0, K)

.

Therefore, we have only three cases for the solutions to PMP:

No treatment solutions (with $Φ (t) < 0$ for all t).
Treat-and-stop solutions (with $Φ (t) < 0$ for $t < x$ and $Φ (t) > 0$ for all $t > x$ ; $x \in (0, T)$ is the period of drug administration.
Continuous MTD, $u = 1$ throughout the entire therapy (with $Φ (t) > 0$ for all t).

Above we considered

p_{W}

fixed. For each

p_{W}

we have proved that there are three possible branches of solutions: Firstly, the no treatment branch with unrestricted horizon time (

T \geq 0

). Secondly, we have treat-and-stop solutions with

T \geq T_{Γ}

, where

\int_{N_{0}}^{N_{Φ}} \frac{d N}{f (N, u = 1) N} = T_{Γ} .

These solutions exist if

N_{Φ} \leq N_{0}

. Finally, we have the MTD solutions with horizon time

T > T_{Γ}

(if

T_{Γ}

positive) and

T \geq 0

when

N_{Φ} > N_{0}

.

p_{W}

is determined by the value of G which itself is a function of the dynamics (25) and, thus, on the trajectory

N (t)

. Thus, for a given horizon time T we need to solve for the initial value

s_{0} (p_{W})

that gives the correct horizon time for each value of

p_{W}

, and then determine

p_{W}

by solving the constraint on

p_{W}

.

We can also prove that for sufficiently small T

Φ < 0

, and, thus,

u = 0

for

t \in [0, T]

. This follows using the bound

p_{N} (t) < - \frac{a_{0} p_{W}}{λ N {(t)}^{2}}

. We have a sufficiency condition for

Φ > 0

(

u = 1

) if

N < f_{d 1} (N) \frac{a_{0} p_{W}}{λ γ},

(35)

where

p_{W}

is given by (29). Since

p_{W}

is proportional to the TCP,

e^{- {(\frac{a_{0}}{λ N (T)} + W (T))}^{- 1}}

, it is small for cases where the tumour size cannot be reduced to

O (1)

since only then does the TCP increases rapidly. Define

ϵ = e^{- {(\frac{a_{0}}{λ N (T)} + W (T))}^{- 1}} |_{u = 1}

assumed small, (under the MTD

\forall t \in [0, T

]). Then,

p_{N} = O (ϵ)

,

p_{W} = O (ϵ)

, the former following since

\frac{d p_{N}}{d t} = O (ϵ)

, (28). Thus,

Φ = - γ + O (ϵ) < 0

for small horizons T, and the optimal solution is no treatment (

u (t) = 0

). When the horizon is sufficiently large, i.e., of order, or larger than

\int_{N_{0}}^{1} \frac{d N}{f (N, 1) N}

, then regimens with non-zero

u (t)

are possible. The expression (35) then gives a lower bound on the time of the MTD for those solutions with treatment. □

Optimal Controls for Logistic Growth: Numerical Study

We analyse the optimal solutions of the DLP for the specific case of the logistic model and treated with a cell cycle drug,

\frac{d N}{d t} = g (1 - 2 d u (t)) (1 - \frac{N}{K}) N, N (0) = N_{0} .

(36)

Dividing cells are killed by the drug with efficacy

d \in (0.5, 1)

, hereafter referred to as the killing fraction of the drug. The lower bound ensures that the tumour size decays at the MTD. g is the per-capita growth rate of the tumour and K is the tumour’s carrying capacity in the absence of the drug. This has the form (26) with

f_{0} (N) = g (1 - \frac{N}{K}), f_{1} (N) = 2 d g (1 - \frac{N}{K})

. We set

b_{0} = 0

for simplicity, i.e., the natural death rate of cancer cells when the tumour size is small is negligible. The birth rate is drug independent by the assumptions above, i.e.,

a (N) = f_{0} (N)

, so at low cell density we have

a_{0} = g, λ_{0} = g

. The death rate is solely due to the drug,

b (N, u) = u f_{1} (N)

, and is density dependent because only cells in cell cycle are killed by the drug, and the cell division rate is density dependent.

The optimal control problem consists of maximising the payoff, (24),

J [N, u; T] = (L_{e} - L_{n}) \frac{e^{- {(\frac{1}{N (T)} + W (T))}^{- 1}} - p_{0}}{1 - p_{0}} + L_{n} - γ \int_{0}^{T} u (t) d t

with

p_{0} = e^{- G {[N, u = 0; T]}^{- 1}}

, subject to the tumour growth given by Equation (36) and the dynamic of W given by, (20),

\frac{d W}{d t} = \frac{g}{N} (1 - \frac{N}{K}), W (0) = 0 .

(37)

From Theorem 1, we know that all solutions to PMP have

u = 1

(MTD) for an interval

t \in [0, x]

,

x \leq T

, the optimal control problem reduces to a one-dimensional optimisation problem of the drug application duration

x \in [0, T]

. By expressing N and W as functions of x, the payoff can be written as:

J (x) = (L_{e} - L_{n}) \frac{e^{- G {(χ (x))}^{- 1}} - p_{0}}{1 - p_{0}} + L_{n} - γ x

(38)

with

χ = e^{x}

, and

G (χ) = \frac{1}{(1 - 2 d)} (\frac{1}{N_{0}} - \frac{1}{K}) (1 - 2 d χ^{- g (1 - 2 d)}) + \frac{1}{K} .

See Appendix C for derivation. We maximise

J (x)

using a sequential quadratic programming method (fmincon solver in matlab). The model parameters were chosen as follows: the growth rate (g) and the carrying capacity (K) of the tumour, were estimated by [31],

g = 0.502

cells/day and

K = 1.297 \times 10^{9}

cells, respectively. It is assumed that the drug is 100% efficient, so

d = 1

. For reference, Table 1 outlines the key parameters from the tumour evolution model and the DLP.

The type of optimal solution (for a fixed initial tumour size

N_{0}

) is dependent on the degree to which the tumour can be reduced over the horizon time T, Figure 3. Thus, for short horizons, only no treatment solutions exist since the TEP is negligible and toxicity dominates, as proved in the general case, Section 4. At horizon time

T = T_{0}

, the optimal solution switches to full MTD, the drug infusion period jumping from 0 to T at

T = T_{0}

, Figure 3B. This is because the pay-off for a continuous MTD regimen is smaller than that for no treatment when

T < T_{0}

, and crosses it at

T = T_{0}

, Figure 4, i.e., the TEP is sufficiently high for the expected gain in lifespan to exceed the toxicity costs. At

T = T_{1}

, continuing at MTD beyond an infusion period

T_{1}

is counter productive, toxicity costs again outweigh gains in the TEP which is close to 1 and, thus, cannot be increased much further, Figure 3C. Thus, the optimal solution switches from continuous MTD to treat-and-stop. The optimal infusion period (

x = T_{1}

) is then independent of the horizon time T as required, Section 3. Thus, for

T > T_{1}

the optimal drug regimens are independent of T and the pay-off is constant. Therefore, optimal solutions are limited by a too short-time horizon when

T < T_{0}

, and for

T_{0} < T < T_{1}

the pay-off (and expected lifespan) can be increased by increasing the time horizon.

Optimal treatment is explored in Figure 5 against drug toxicity

γ

and the patient lifespan

L_{n}

if they are not treated; trends are identical for the cases of a short and long lifespan if the tumour is eliminated,

L_{e} = 2, 20

years. The drug infusion interval decreases as toxicity

γ

increases, and the lifespan gain

L_{e} - L_{n}

decreases, with a critical boundary (in

L_{n}, γ

space) between untreatable (no treatment is optimal) and treatable patients. The drug infusion period jumps from zero to over 40 days in this case as therapy switches from no treatment to treat-and-stop therapy, Figure 5A,D, with a corresponding jump in the

TEP

from 0 to near 1 (in this case over 0.95), Figure 5C,F. Since the drug infusion period is below the time horizon,

T = 60

days, patients in the untreatable category are untreatable for all time horizons. The gain in (discounted) lifespan is approximately linear throughout the treatment range in both

γ

and

L_{n}

, Figure 5B,E.

The discounted expected lifespan (DLP) is appropriate for modelling tolerable nausea; severe toxicity costs of therapy are better modelled as SAEs, see Section 5. Under the quality of life interpretation of the DLP we have

γ

order 1. We illustrate optimal solutions for

γ = 1

. We consider a patient with a short life expectancy under no treatment,

L_{n} = 1

month, examining therapy for various initial tumour sizes

N_{0}

and patient age (assuming an average life expectancy of 85 years, patient age is

85 - L_{e}

years), Figure 6. For an efficacious drug, the gain in life expectancy is substantial, with life expectancy approximately equal to the life expectancy of a healthy individual (achieving over

99.4 %

of the life expectancy of a healthy individual), regardless of initial tumour size and

L_{e}

. The time horizon is

T = 60

days whilst the drug infusion period is 28–49 days, increasing with both the initial tumour size and

L_{e}

. Thus, all optimal solutions are treat-and-stop and independent of the time horizon. The drug infusion period is only weakly dependent on

L_{e}

after a rapid rise, whilst tumour size is the main determinant. All patients are treatable for these parameters (

1 year \leq L_{e} \leq 60

years,

L_{n} = 1

month,

10^{4} \leq N_{0} \leq 10^{7}

,

γ = 1

).

5. Severe Adverse Effects Model: Incorporating Treatment Related Mortality Events

The DLP of Section 4 is only appropriate when toxicity is tolerable. Treatment toxicity can, however, be debilitating, leading to treatment cessation or even death (treatment-related mortality, or TRM) [32]. Adverse effects are graded 1–5 under the common terminology criteria for adverse effects (CTCAE), a classification managed by the Cancer Therapy Evaluation Programme [33]. Grades 1 and 2 are mild, serious adverse effects are 3–5, grade 3 events may require hospitalisation, grade 4 is life threatening and in need of urgent medical care, and grade 5 is death through an adverse event. Thus, the negative effects of treatment can be incorporated as a SAE event, causing cessation of treatment and/or death.

Here we consider the event of patient death through TRM, with other events, such as tumour elimination. TRM is modelled as a Poisson processes with rate

μ (u, t)

, assumed a function of time and the current drug concentration. The expected patient lifetime is then given by

J [N, u; T] = (1 - p_{d} [N, u; T]) \sum_{k = 1}^{n} p_{k} [N, u; T] L_{k} + \int_{0}^{T} t μ (u (t), t) (exp - \int_{0}^{t} μ (u (s), s) d s) d t

(39)

comprising two terms corresponding to the events of surviving therapy, with probability

1 - p_{d} [N, u; T]

, and TRM, with probability

p_{d} [N, u; t]

, respectively. The last term is the expected lifetime conditioned on TRM. Here

p_{d} [N, u; t] = 1 - exp (- \int_{0}^{t} μ (u (s), s) d s)

is the probability of death caused by treatment over time t. Generally we can have multiple possible futures

k = 1, \dots, n

with expected lifetimes

L_{k}

. Only if the patient survives treatment are these events relevant. There is, thus, a dependence hierarchy; survival depends on

u (t)

only (and is independent of the other events), and the other events (can) depend on the state history

(N (t), u (t))

.

5.1. Logistic Tumour Growth with TRM Treated with a Cell Cycle Targeting Drug

Here we explore optimal solutions for the logistic tumour evolution model treated with cell cycle targeting drugs, i.e., tumour dynamics (36) supplemented with the BP birth rate a as in Section 4 (i.e., independent of the drug). We assume that the patient death rate due to the drug is proportional to the amount of drug administered and is given by

μ = α u

, with

α = \frac{1}{τ}

, and

τ

the expected period a patient can survive under continuous MTD drug administration, hereafter referred to as the drug tolerance interval. We consider three outcomes at the time horizon: the patient does not survive treatment (TRM), cure (tumour clearance and survival to T), and failure to eliminate the tumour, so the patient will eventually relapse. As before, we assume

L_{n} > T

. The payoff with these three outcomes is:

J [N, u; T] = e^{- α \int_{0}^{T} u (t) d t} (L_{e} p_{e} [N, u; T] + L_{n} (1 - p_{e} [N, u; T])) + \int_{0}^{T} α t e^{- α \int_{0}^{t} u (s) d s} u (t) d t,

(40)

where

p_{e} [N, u; T]

is the treatment dependent elimination probability, TEP, (23). The parameter values for the tumour dynamics (ODE model and BP model) are as in Section 4. The parameters used for the SAP are outlined in Table 2, whereas the tumour evolution parameters are outlined in Table 1.

Applying PMP, the optimal solutions are bang-bang, and as with the DLP there can only be three types of solutions: no treatment, the MTD for the full horizon, and treat-and-stop solutions (at the MTD), proof in Appendix D. We numerically determined optimal solutions using a direct method for bang-bang solutions, see Appendix E; specifically, the control is piece-wise constant taking only the values 0 or 1 on a partition of

[0, T]

. A sequential quadratic programming (SQP) method (Matlab fmincon solver) was utilised to determine the optimal switching times.

Similar to Section 4, for small time horizons optimal solutions are no treatment,

T < T_{1}

, continuous MTD for intermediate horizons,

T_{1} < T < T_{2}

, and treat-and-stop solutions for large horizons,

T > T_{2}

. For large enough time horizons, T, the optimal solutions are independent of T and all optimal treat-and-stop solutions have an identical drug regimen, Figure 7.

We illustrate the TRM model with

L_{e} = 40

years and

L_{n} = 2

months, Figure 8. The patient’s life expectancy, the optimal drug administration period, the patient’s survival probability and the cure probability increase with

τ

, Figure 8. For

τ < 180

days, the probability that a patient survives therapy is close to

0.78

, Figure 8B with a TEP post-treatment in excess of

0.72

, Figure 9. Patients can be divided into a treatable and an untreatable category based on the value of the drug tolerance interval,

τ

; patients are treatable only if they can survive continuous MTD drug administration for 7 days or longer, Figure 8C; these patients are treated with the MTD for a minimum of 38 days, Figure 8C. Thus, there exists a critical value of the drug tolerance interval,

τ = τ_{crit}

, where the optimal drug administration period jumps from 0 to a non-zero value. Despite the abrupt change in the optimal drug administration period, the patient’s life expectancy under optimal treatment remains continuous at the critical threshold

τ = τ_{crit}

Figure 8A,C. A comparison of no treatment to the MTD is shown in Figure 10 for

τ < τ_{c r i t}

and

τ > τ_{c r i t}

, the switch of solution causing a jump in the drug infusion period whilst retaining continuity in the pay-off. MTD drug administration has a local maximum in the payoff with drug infusion period that becomes the global maximum at

τ = τ_{c r i t}

, Figure 10, having a payoff

J [N, u = 0; T] = L_{n}

.

Patients in the treatable category with low drug tolerance intervals

τ

receive treatment for a duration exceeding their expected survival time under continuous MTD administration, Figure 9. The optimal drug administration period increases with

τ

, but becomes eventually shorter than

τ

. Thus, at low values of

τ

, optimal drug strategies focus on achieving tumour elimination despite high TRM risks. In contrast, at high values of

τ

, tumour control is more readily achieved, and drug toxicity becomes the primary factor limiting the infusion period.

The optimal drug administration period, x, increases with both the initial tumour size (

N_{0}

) and the drug tolerance interval (

τ

), Figure 11A; the strong dependence of

N_{0}

on x is not surprising as the tumour needs to be reduced to the order of a few cells before elimination is feasible. The TEP is almost independent of

N_{0}

for fixed

τ

and is strictly increasing with

τ

, taking values larger than

0.8

, Figure 11C; this is achieved by the total amount of drug increasing with the initial tumour size. The cure probability increases with

τ

with a shallow decrease with the initial tumour size, Figure 11D; this is because of the decreasing probability of the patient survival under a longer drug administration period. The cure probability and the relative life expectancy under optimal treatment have similar dependence on

N_{0}, τ

, taking similar values Figure 11B,D. Dependence of the optimal solutions on

L_{e}

, or age, can also be explored, Appendix F. Dependence on age is weak.

Patients can be subdivided into a treatable and an untreatable class, based on the drug tolerance interval and the size of the tumour prior to therapy, Figure 12. Treatable patients, who are characterised by

(τ, N_{0})

values that are close to the boundary of the two classes, are treated with large periods of MTD drug administration (spanning from 25–50 days), Figure 12A, whereas the cure probability is of the order of

10^{- 2}

, Figure 12B. Optimal drug regimens favour the objective of eliminating the tumour over the objective of retaining low levels of drug toxicity. Because

L_{e}

is equal to 40 years and

L_{n}

is 2 months, the potential gain in life expectancy under drug administration can be much larger than the life expectancy under no treatment, even when the cure probability is small.

5.2. Intermediate SAE and Iterative Therapy Under LEP Optimisation

SAEs can lead to cessation of treatment for a period of time or a shift to another treatment, for instance the maximum drug dose could be reduced, or a less intensive drug used. Treatment with cessation events, thus, occurs in phases, phase j having

j - 1

previous SAE events. The objective is formulated as follows. We define the expected patient lifespan for each phase of treatment, i.e.,

J_{s}^{(j)} [N, \vec{u}; T]

is the expected lifespan after

j - 1

SAEs, with the last occurring at time

s \in [0, T]

; phase j, thus, starts at time s with tumour size

N (s)

determined by the treatment before s. Here the drug concentration is a vector,

\vec{u} (t)

, with the patient being treated with drug concentration

u_{j} (t)

(component j) in the

j

th treatment phase, which could in fact be the same drug but with a reduced MTD.

Consider the case with a potential SAE event with dose dependent rate

α u_{1} (t)

, that causes the current therapy to cease and a switch to an alternative therapy, drug concentration

u_{2} (t)

. We assume this alternative treatment is not subject to the possibility of an SAE. Similar to the case of TRM, (40), we have the expected lifespan,

\begin{matrix} J [N, \vec{u}; T] & = & e^{- α \int_{0}^{T} u_{1} (s) d s} (L_{e} p_{e} [N, \vec{u}; T] + L_{n} (1 - p_{e} [N, \vec{u}; T])) \\ + \int_{0}^{T} α u_{1} (s) (t) e^{- α \int_{0}^{t} u_{1} (s) d s} J_{t}^{(2)} [N, \vec{u}; T] d t, \end{matrix}

(41)

where

p_{e} [N, \vec{u}; T]

is the treatment dependent elimination probability, TEP, (23), and

J_{t}^{(2)} [N, \vec{u}; T]

is the expected lifespan conditioned on switching to treatment 2 at time t. This is an adaptive therapy, since an SAE is an observation event that changes the therapy if it occurs.

To compute the expected lifetime (41), we, therefore, need to solve for the optimal treatment

u_{2} (s), s > t

, for any SAE time t, initial tumour size

N (t)

, by maximising the conditional LEP,

J_{t}^{(2)} [N, \vec{u}; T] = t + (L_{e} - t) p_{e S A E} [N, \vec{u}; t, T] + (L_{n} - t) (1 - p_{e S A E} [N, \vec{u}; t, T])

where

L_{e}

is the expected lifespan on tumour elimination,

L_{n}

is the expected lifespan if the tumour is not eliminated, and

p_{e S A E} [N, \vec{u}; t, T]

is the probability of tumour (lineage) elimination conditioned on an SAE at t. In practice, the SAE may affect these lifetimes. We, thus, have a complex iterative optimisation problem. To accommodate SAE grade, rates for each grade of SAE would need to be defined, whilst allowing for multiple phases of treatment (multiple SAEs) would further extend the iterative depth.

6. Conclusions

We have proposed a new optimal control criterion for determining optimal chemotherapy scheduling and doses within the context of key events affecting outcome, such as tumour elimination (cure, if patient survives treatment) and patient death through TRM. We propose that maximising the patient’s life expectancy by averaging over all possible outcome events is a realistic interpretation of the objective of therapy, i.e., is a quantifiable measure of the best patient outcome. We developed the proposal within a deterministic modelling framework based on the rate of event occurrences, such as tumour elimination. The life expectancy pay-off is very general; here, we illustrated its use on chemotherapy with three possible events (cure, failure to clear tumour cells, and TRM) whilst it can be generalised to include surgery, transplant, combination therapy, adaptive therapy, and multiple stage therapy. Tumour growth dynamics (both with and without therapy), and the event rates need to be parametrised, specifically the life expectancy and SAE rates in terms of therapy parameters. Thus, for chemotherapy we need to incorporate drug mechanism and drug efficacy into the tumour dynamics, and the rate of TRM. Parametrised tumour models have been developed, [11,34,35,36], whilst TRM rates, [32,37,38] and survival data (overall survival and progression free survival) may allow these additional parameters to be estimated. Incorporating other events will need their rates and dependence on tumour characteristics, such as size and replication rate and therapy to be determined. If parametrisation includes patient demographics then this is a natural framework for personalised therapy optimisation.

The optimisation criterion we propose addresses various challenges in the field of applied optimal control for cancer chemotherapeutics, and offers a number of advantages over previous formulations, Table 3. To our knowledge, there has been no previous attempt in the literature to formulate an optimal control problem that considers tumour characteristics after therapy; typically, the performance criteria for evaluating a therapy’s anti-tumour efficacy is based on the tumour’s state at the end of treatment. Our lifetime expectancy pay-off addresses this issue by modelling stochastic events during therapy, including tumour elimination, events that determine patient prognosis. Therefore, our formulation inherently incorporates considerations post-treatment horizon, and in fact quantifies the probability of each outcome (set of events). Being based on the expected lifespan, the lifetime payoff has two additional attractive properties. Firstly, the optimal solutions are independent of the horizon time T once drug application ceases. In effect, provided a sufficiently large horizon time is used, all solutions are then independent of T. To our knowledge no other optimal cancer therapy (finite time horizon) control problem has this property. Secondly, our objective has no subjective free parameters as it is interpretable. This contrasts to traditional approaches of cancer therapy optimisation that frame it as a multiple objective optimisation problem. For instance, the two competing objectives of controlling the tumour size and limiting the impact of toxicity to the patient are often combined as a weighted sum [1,9]. A drawback of this method is that the weights are arbitrary, and it is hard to give a biological justification for a given choice of the weights. Balancing multiple competing objectives can also be analysed using Pareto optimality [39,40] that delays making subjective choices; a solution is Pareto optimal if no objective can be improved without compromising another. Solving the Pareto optimisation problems involves identifying all Pareto-optimal solutions, which defines the Pareto front. A Pareto optimal solution is then subjectively selected, either based on the decision maker’s preferences or utilising additional factors that could include patient input. However, as with traditional objective weighting methods, solutions determined by Pareto optimisation depend on the optimisation parameters, [41], and, therefore, depend on subjective choices. In the LEP formulation, all competing therapy objectives are inherently weighted by the event-probabilities (e.g., tumour elimination) that have direct biological interpretations; there is, thus, a single objective: the patient’s expected lifetime averaged over all therapy outcomes to be maximised. The LEP’s advantages are summarised in Table 3.

Two life expectancy pay-off models were presented and analysed, where DLP can be considered a tolerable toxicity model and SAP a model of severe toxicity. Both models incorporate the event of tumour elimination, i.e., patient cure, and are illustrated on a one-compartment tumour model with a cell cycle targetting drug. In DLP, chemotherapy can reduce significantly the patient’s quality of life, and, thus, we proposed a discounted pay-off to model that poor quality of life during therapy, [26]. In SAP, drug-induced death is modelled as a Poisson process with rates proportional to the amount of drug administered at any time point of the treatment. For both optimisation problems we proved that (1) optimal solutions are bang-bang, and (2) every solution that satisfies PMP has at most one switch and can only switch from continuous MTD to no drug at some time point x. Since PMP is a necessary condition for optimality, finding the optimal solution reduces to determining the optimal switching time x, where the control transitions from the MTD to no drug. This reduces the original optimal control problem to a one-dimensional NLP problem.

In both of these models, patients are either treatable or untreatable. In our analysis, treatability was dependent on tumour characteristics, such as size, and patient characteristics, such as their expected lifetime without treatment (

L_{n}

) and tolerability to drug toxicity, Figure 5 and Figure 12. More generally, we would expect that treatability would be dependent on tumour type and stage. A personalised LEP approach would, thus, allow this information to be incorporated and patient specific therapy designed. Provided the time horizon is large enough treatability is independent of the time horizon and treatment switches from no treatment to treat-and-stop.

Optimal solutions of both payoffs prioritise tumour elimination over low total drug toxicity. At the untreatable boundary the optimal drug administration period jumps from 0 days to a period large enough to achieve a large TEP (for the parameters used in this study, treatable patients have a

TEP > 0.8

); this is because the potential gain in life expectancy, when the tumour is eliminated, is much greater than the life expectancy when treatment does not eliminate the tumour. In particular, in SAP treatable patients close to the untreatable boundary can have a cure probability smaller than

0.02

, with drug administration exceeding 20 days (Figure 12). For the parameters used here, treatable patients under optimal treatment have an expected lifetime that spans from 10–83% of the life expectancy of a healthy individual Figure 11B. This high variability depends primarily on the initial tumour size and the drug tolerance interval

τ

, since patients that are more susceptible to the drug are more likely to die during therapy. In the DLP, the life expectancy of a treatable patient is nearly identical to that of a healthy individual—exceeding 99.4% of a healthy individual, Figure 6B.

In general, SAEs significantly disrupt the course of treatment, potentially leading to an interruption of treatment, [42], a permanent cessation of treatment with a shift to palliative care [43] or TRM [44]. We presented a SAP model based on TRM, but it can be modified to account for SAE that causes treatment to be ceased or modified. Since treatment is modified based on an observation (SAE event), this comprises an adaptive therapy. This SAE interuption of therapy then gives an iterative optimisation problem involving conditional subproblems with a SAE at time

t \in [0, T]

, see Section 5.2.

We illustrated the lifetime expectancy pay-off for chemotherapy optimisation of simple tumour growth models with only two events (cure, TRM). This has the advantage of avoiding the need to describe and justify detailed cancer models, thus, allowing the pay-off formulation to take centre stage. Further, the models are sufficiently simple to allow formal analysis, in particular we proved that optimal solutions are treat-and-stop. Other types of solutions, such as delayed treatment [1,20,45] or administrating the highest drug dose at the end of the treatment period [46,47], have been found to be optimal in a number of theoretical studies, solutions that are clinically ill advised. So it is reassuring that our pay-off gives clinically suitable solutions; how robust this is for more realistic tumour growth models and additional events is unclear and needs to be examined.

The LEP framework is very flexible, and can be generalised in multiple ways, including incorporating other treatments, such as radiotherapy and surgery, more complex cancer growth models [8,48], including multi-compartment models, additional events such as mutation and metastasis, and more realistic drug mechanism models with more realistic application schedules (discrete injections) and pharmacokinetics/dynamics. To incorporate additional events, the event rates

μ_{k} (N, u)

need to be parametrised (with dependence on the tumour size and drug). However, because we classify outcomes by tumour elimination, we need to also allow for the time of elimination during therapy, since if eliminated the event rate changes to

μ_{k} (0, u)

. Thus, we need to have the joint event of tumour elimination and the event; this will be pursued elsewhere. These generalisations will likely pose optimisation problems where the PMP equations (the TPBVP) cannot be solved analytically, and, thus, bang-bang solutions are not assured. Numerical methods are then needed [49,50] that either produce solutions to PMP (so called indirect methods), or direct methods that approximate the control function using a linear combination of basis functions (e.g., Lagrange polynomials or piece-wise constant functions), giving a finite-dimensional non-linear programming (NLP) problem. There are efficient NLP-solver algorithms in the literature, many of which are open source [51,52].

The LEP framework has the potential for application in real-world cancer therapy, for instance, in personalised therapy, supporting clinical decision making by identifying how different patient groups might benefit most effectively from therapy, and determining the most effective treatment in fragile patients, thereby reducing overtreatment. AML for example is typically treated with intensive therapy, but older patients may be treated with less intense, less effective therapy [53,54,55,56]. Having a clinical diagnostic tool to ascertain best therapy based on patient health and other factors would potential transform geriatric AML treatment. However, although the LEP has a key advantage over standard cancer treatment cost functions in that it is objective and the parameters have direct interpretations, there remains the difficulty of determining those parameters from real world data. Three types of data are required. Most accessible is life expectancy data, such as from life tables that typically allow for multiple covariates, including race, socio-economic status [57], and comorbidities [58]. Survival (Kaplain–Meyer) curves can potentially be used to structure expected lifespan by cancer subtype, grade, and stage, although data to determine average lifespan in the absence of treatment is likely rare. Data on causes of deaths across different cancers is also available [59]. Secondly, the rates of the possible events is needed, with dependency on tumour size and drug concentration; tumour size is known to be a prognostic factor in many cancers, playing a key role in cancer stage, although likely a prognostic factor alone in many cancers (hepatocellular carcinoma [60], breast cancer [61,62], and NSCLC [63,64]). Finally, tumour growth and death rates are needed both in the presence and absence of the drug(s). Growth dynamics and drug efficacy is extremely complex for solid tumours, given the genetic and environmental heterogeneity of the tumour [65], whilst the growth dynamics of leukaemias is better characterised [10,11,66,67,68]. Developing predictive models for key events in cancer is a growing field; thus, as these improve, the LEP framework will be far more tractable.

The primary objective of this work was to develop a novel deterministic payoff functional that quantifies a patient’s expected lifetime under different therapy outcomes, incorporating the negative impact of excessive drug toxicity on life expectancy. As an initial step toward studying this objective, simple one-compartment tumour growth models were considered. Future extensions may incorporate detailed tumour models, drugs, and detailed pharmacokinetics/pharmacodynamics. In the case of blood cancers, multi-compartment models are usually employed to describe different levels of hierarchies in the blood maturation process in both cancer and healthy cell lineages. In the case of solid tumours spatial heterogeneity should be addressed, since cells that are in a hypoxic environment tend to be more resistant to the drug compared to non-hypoxic cells. Spatial heterogeneity is usually addressed by PDE models since spatio-temporal characteristics of the tumour should be considered. Our formulation should apply to all these extensions.

Table 3. Therapy optimisation formulations.

Context	Formulation	Optimisation Approach	Parameters	Post-Horizon Outcomes	Dependence on Horizon
Optimise drug therapy over dose, timings, and combinations.	Multiple competing objectives, defined by objective functions $J_{i}, i = 1 \dots n$ .	Minimise weighted sum $\sum_{i = 1}^{n} w_{i} J_{i}$ , arbitrary weights $w_{i}$ [1,5,9].	$w_{i}$ and $J_{i}$ subjectively chosen.	Not normally considered [5].	Yes.
		Pareto optimisation [40].	Subjective choice of $J_{i}$ , and subjective selection of best solution on Pareto front.
Optimise drug therapy over dose, timings, and combinations.	Maximise patient lifespan.	Life expectancy pay-off (LEP).	Event rates and lifespan conditioned on events need to be determined from data.	Classified by events, for instance including cure, relapse, death.	Invariant provided therapy ended.
Optimise over choice of defined treatments.	Maximise patient lifespan	Dynamic treatment regime models [24,69].	Treatment outcome statistics required from data.	Various, including cure, relapse, death.	Not relevant

Author Contributions

Conceptualisation, N.J.B.; Formal analysis, N.J.B. and B.D.E.T.; Methodology, N.J.B. and B.D.E.T.; Software, B.D.E.T.; Supervision, N.J.B. and A.B.; Validation, B.D.E.T.; Visualisation, N.J.B. and B.D.E.T.; Writing—original draft, N.J.B. and B.D.E.T.; Writing—review and editing, N.J.B., B.D.E.T., and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

Byron D. E. Tzamarias received funding by the Engineering and Physical Sciences Research Council (grant number 2441135).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

Numerics and graphics were generated using Matlab R2023a.

Conflicts of Interest

The authors have no conflicts of interest to declare.

Appendix A. BP and Average Tumour Size Dependent on Die-Out

The BP is described by the path probability

P (Z (t) | Z (0) = N_{0})

. We are interested in calculating expectations conditioning on elimination, or not at a time

t = T

, for instance,

E [g [Z] (T)] = E [g [Z] (T) | Z (T) = 0] P (Z (T) = 0) + E [g [Z] (T) | Z (T) > 0] P (Z (T) > 0)

Then, if dependence of

g [Z] (t)

on

Z (t)

is negligible for small

Z (t)

, we can approximate

E [g [Z] (T) | Z (T) = 0]

by using the mean trajectory

E [Z (t) | Z (T) = 0, Z (0) = N_{0}]

. Define

F_{N_{0}, d o T} (s, t)

as the PGF for the process that dies out by T, with

Z (0) = N_{0}

. Then, we have

\begin{matrix} F_{N_{0}, d o T} (s, t) & = & \frac{1}{P (Z (T) = 0 | Z (0) = N_{0})} \sum_{k = 0}^{\infty} P (Z (t) = k & Z (T) = 0 | Z (0) = N_{0}) s^{k} \\ = & \frac{1}{P (Z (T) = 0 | Z (0) = N_{0})} \sum_{k = 0}^{\infty} P (Z (t) = k | Z (0) = N_{0}) P (Z (T) = 0 | Z (t) = k) s^{k}, \end{matrix}

(A1)

where we used the Markov property. Each of the

Z (t)

BPs are independent and, thus, we obtain,

F_{N_{0}, d o T} (s, t) = \frac{1}{P (Z (T) = 0 | Z (0) = N_{0})} \sum_{k = 0}^{\infty} P (Z (t) = k | Z (0) = N_{0}) F_{1} {(0, T - t)}^{k} s^{k} = \frac{F_{N_{0}} (s F_{1} (0, T - t), t)}{F_{N_{0}} (0, T)}

where

F_{N} (s, t)

is the PGF for a BP

Z (t)

with

Z (0) = N

,

F_{N} (s, t) = {(1 - {(\frac{Λ (t)}{1 - s} + \int_{0}^{t} a (t^{'}) Λ (t^{'}) d t^{'})}^{- 1})}^{N}, Λ (t) = exp \int_{0}^{t} (b (t^{'}) - a (t^{'})) d t^{'} .

We observe that

F_{N_{0}, d o T} (0, t) = s^{N_{0}}

and

F_{N_{0}, d o T} (s, T) = 1

as required. Similarly, we can compute the PGF for the process that doesnt die out by T, using the probability

P (Z (T) > 0 | Z (t) = k) = 1 - P (Z (T) = 0 | Z (t) = k)

in the above. The desired average

N (t)

is then given by

N (t) = \frac{\partial}{\partial s} F_{N_{0}, d o T} (s, t) |_{s = 1} = \frac{F_{1} (0, T - t)}{F_{N_{0}} (0, T)} \frac{\partial}{\partial s} F_{N_{0}} (s, t) |_{s = F_{1} (0, T - t)}

Appendix B. Dependence of the TCP on the Minimum of N(t)

We have the TCP expression (8)

TCP (t) = {(1 - {(N (0) G (t))}^{- 1})}^{N (0)} with G (t) = \frac{1}{N (t)} + \int_{0}^{t} \frac{a (t^{'})}{N (t^{'})} d t^{'} .

The integrand is dominated where

\frac{a (t^{'})}{N (t^{'})}

has a maximum, equivalent to a maximum in

log (\frac{a (t^{'})}{N (t^{'})})

. Let this maximum occur at

t^{'} = t^{*}

. Taylor expanding about

t^{*}

we have

log (\frac{a (t^{'})}{N (t^{'})}) = log (\frac{a (t^{*})}{N (t^{*})}) - \frac{τ}{2} {(t^{'} - t^{*})}^{2} + \dots

where

τ > 0

is the (modulus of the) second derivative. Then, we have the approximation

G (t) \approx \frac{1}{N (t)} + \frac{a (t^{*})}{N (t^{*})} \int_{0}^{t} e^{- \frac{τ}{2} {(t^{'} - t^{*})}^{2}} d t^{'}

For the case where the maximum

t^{*} \in (0, t)

is well within the interval (relative to the standard deviation

\sqrt{\frac{1}{τ}}

), then to a good approximation,

G (t) \approx \frac{1}{N (t)} + \sqrt{\frac{2 π}{τ}} \frac{a (t^{*})}{N (t^{*})}

Since

N (t^{'})

typically varies by orders of magnitude, the TCP is, thus, dominated where the tumour size is near its minimum (strictly where

\frac{a (t^{'})}{N (t^{'})}

is close to its maximum), and, thus, will also dominate the first term

\frac{1}{N (t)}

. In the case where a is a constant, these criteria are equivalent.

Appendix C. The Payoff as a Function of the Optimal Drug Administration x

The three cases of optimal solutions derived in Section 4 have an optimal control of the form:

u (t) = 1, 0 \leq t \leq x

,

u (t) = 0, x \leq t \leq T

for

0 \leq x \leq T

. No treatment solutions have

x = 0

, MTD solutions have

x = T

and treat-and-stop solutions can only have values of x in the interval

0 < x < T

. By expressing the terminal state and the running cost in terms of the drug administration period x, it is possible to express J as a function of x and assess the global maximum.

The terminal tumour size is given by:

N (T) = N_{0} \frac{exp (g (- 2 d x + T))}{1 + \frac{N_{0}}{K} (exp (g (- 2 d x + T) - 1))}

, with

N_{0} = N (0) .

Derivation: For constant u, the tumour evolution equation, Equation (36), has the analytical solution:

N (t) = N (t_{0}) \frac{exp (g (1 - 2 d u) (t - t_{0}))}{1 + \frac{N (t_{0})}{K} (exp (g (1 - 2 d u) (t - t_{0})) - 1)} .

(A2)

Define

u (t)

as:

u (t) = 1

, when

t \in [0, x)

and

u (t) = 0

, when

t \in [x, T)

. For initial condition

N (0) = N_{0}

, we get:

N (x) = N_{0} \frac{exp (g (1 - 2 d) x)}{1 + \frac{N_{0}}{K} (exp (g (1 - 2 d) x) - 1)}

and

N (T) = N (x) \frac{exp (g (T - x))}{1 + \frac{N (x)}{K} (exp (g (T - x)) - 1)}

. Combining the two Equations we get:

N (T) = N_{0} \frac{exp (g (- 2 d x + T))}{1 + \frac{N_{0}}{K} (exp (g (- 2 d x + T) - 1))}

The terminal value of W is given by:

W (T) = \frac{a_{0}}{λ} (\frac{1}{N_{0}} - \frac{1}{K}) (e^{- g (1 - 2 d) x} - e^{- g (T - 2 d x)} + \frac{1 - e^{- g (1 - 2 d) x}}{1 - 2 d}) .

Derivation: For constant control, the W-state equation, Equation (37), has the analytical solution:

W (t) = \frac{a_{0}}{λ} \frac{1}{(1 - 2 d u)} (\frac{1}{K} - \frac{1}{N_{0}}) (e^{- g (1 - 2 d u) x} - 1) + W_{0} .

Define

u (t)

as:

u (t) = 1

, when

t \in [0, x)

and

u (t) = 0

, when

t \in [x, T)

. For initial conditions

N (0) = N_{0}

, and

W (0) = 0

we get:

W (x) = \frac{a_{0}}{λ} \frac{1}{(1 - 2 d)} (\frac{1}{K} - \frac{1}{N_{0}}) (e^{- g (1 - 2 d) x} - 1)

and

W (T) = (\frac{a_{0}}{λ} (\frac{1}{K} - \frac{1}{N (x)}) (e^{- g (T - x)} - 1)) + W_{x} .

By combining the two solutions with Equation (A2),

W (T)

can be expressed as:

W (T) = \frac{a_{0}}{λ} (\frac{1}{N_{0}} - \frac{1}{K}) (e^{- g (1 - 2 d) x} - e^{- g (T - 2 d x)} + \frac{1 - e^{- g (1 - 2 d) x}}{1 - 2 d}) .

The tumour lineage elimination probability is given by

p (x; T) = exp (- G {(x; T)}^{- 1})

, with

G = \frac{a_{0}}{λ} \frac{1}{N (x; T)} + W (x; T)

(Equation (11)) where

N (x; T) = N (T)

and

W (x; T) = W (T)

when

u (t) = 1

for

t < x

and

u (t) = 0

for

t > x

.

Having expressed the terminal states of W and N as functions of x the probability of eliminating the tumour can be written as:

p (x) = \frac{exp (- G {(x)}^{- 1}) - exp (- G {(0)}^{- 1})}{1 - exp (- G {(0)}^{- 1}),}

with

G (x) = \frac{a_{0}}{λ (1 - 2 d)} (\frac{1}{N_{0}} - \frac{1}{K}) (1 - 2 d χ^{- g (1 - 2 d)}) + \frac{a_{0}}{λ K},

where

χ = exp (x)

. The objective function is, thus, a function of the drug administration period x and is given by:

J (x; T) = (L_{e} - L_{n}) p (x) + L_{n} - γ x .

Appendix D. Optimal Solution Types for the Severe Adverse Effect Payoff with Logistic Growth

Here we prove that the SAE payoff (Equation (40)) has bang-bang optimal solutions, for the logistic tumour evolution model (Equation (36)), with

b_{0} \geq 0

. We show that optimal solutions have at most a single switching point and that delayed treatments cannot be optimal. The objective function can be expressed as:

J [N, u; T] = e^{- α U (T)} (L_{e} p_{e} [N, u] + L_{n} (1 - p_{e} [N, u])) + \int_{0}^{T} α t e^{- α U (t)} u (t) d t

(A3)

with state variables N, W, U given by:

\begin{matrix} \frac{d N}{d t} = ρ (u) (1 - \frac{N}{K}) N, N (0) = N_{0} \\ \frac{d W}{d t} = \frac{a_{0} g}{λ} (\frac{1}{N} - \frac{1}{K}), W (0) = 0 \\ \frac{d U}{d t} = u (t), U (0) = 0, \end{matrix}

where

ρ (u) = g (1 - 2 d u (t))

. The Hamiltonian is given by:

H = ρ (1 - \frac{N}{K}) N p_{N} + \frac{a_{0} g}{λ} (\frac{1}{N} - \frac{1}{K}) p_{W} + u p_{U} + a t e^{- α U} u (t),

p_{N}

,

p_{W}

,

p_{U}

are the costates of N, W, and U, respectively. The costate dynamics are given by:

\begin{matrix} \frac{d p_{N}}{d t} = - ρ (u) (1 - 2 \frac{N}{K}) p_{N} + \frac{a_{0} g}{λ} \frac{1}{N^{2}} p_{W} \end{matrix}

(A4)

\begin{matrix} \frac{d p_{W}}{d t} = 0 \end{matrix}

(A5)

\begin{matrix} \frac{d p_{U}}{d t} = α^{2} t e^{- α U} u (t), \end{matrix}

(A6)

and it is pointed out that

p_{W}

is constant. Using the transversality condition,

p_{W} (T) = \frac{L_{e} - L_{n}}{1 - p_{0}} \frac{e^{{(\frac{a_{0}}{λ N (T)} + W (T))}^{- 1}}}{{(\frac{a_{0}}{λ N (T)} + W (T))}^{2}} e^{- α U (T)},

(A7)

we can show that

p_{W}

is positive.

We assume that there exists an optimal solution with a singular arc in an interval

I \subseteq [0, T]

. The switching function,

Φ (t) = \frac{\partial H}{\partial u}

and it’s time derivatives must be zero for all

t \in I

. The generalised Legendre Clebsch (LC) conditions [70] state that any singular solution that is a maximiser, satisfies the inequality

{(- 1)}^{k} \frac{\partial}{\partial u} (\frac{d^{2 k}}{d t^{2 k}} Φ) \leq 0,

with

p = 2 k

the smallest integer with the property

\frac{\partial}{\partial u} (\frac{d^{p}}{d t^{p}} Φ) \neq 0

. On the other hand, singular solutions that are minimisers satisfy

{(- 1)}^{k} \frac{\partial}{\partial u} (\frac{d^{2 k}}{d t^{2 k}} Φ) \geq 0

. By successively differentiate the switching function until obtaining a time derivative that is dependent on u we get:

\begin{matrix} Φ (t) = - 2 d g (1 - \frac{N}{K}) N p_{N} + p_{U} + α t e^{- α U} \end{matrix}

(A8)

\begin{matrix} \frac{d Φ}{d t} = - \frac{2 d a_{0} g^{2}}{λ} p_{W} (\frac{1}{N} - \frac{1}{K}) + α e^{- α U} \end{matrix}

(A9)

\begin{matrix} \frac{d^{2}}{d t^{2}} Φ = \frac{2 d a_{0} g^{2}}{λ} p_{W} ρ (u) (\frac{1}{N} - \frac{1}{K}) - α^{2} e^{- α U} u (t) . \end{matrix}

(A10)

Notice that

k = 1

and that

{(- 1)}^{k} \frac{\partial}{\partial u} (\frac{d^{2 k}}{d t^{2 k}} Φ) = \frac{4 d^{2} a_{0} g^{3}}{λ} p_{W} (\frac{1}{N} - \frac{1}{k}) + α^{2} e^{- α U} > 0,

since

p_{W}

is positive. Therefore, the optimal solution that contains the singular arc in I cannot be optimal. Consequently, the payoff J can only have bang-bang optimal solutions.

In the following we show that the optimal control cannot switch from 0 to 1, thus, optimal solutions have at most one switching point. To formulate the proof, we consider the following remarks:

Remark A1.

The switching function can be expressed as:

Φ (t) = Φ_{1} (t) + Φ_{2} (t)

with

Φ_{1} (t) = - 2 d g (1 - \frac{N (t)}{K}) N (t) p_{N} (t)

and

Φ_{2} = p_{U} + α t e^{- α U}

Remark A2.

Φ_{1}

is strictly decreasing independently of the control u.

Φ_{2}

is strictly increasing if

u (t) = 1

and remains constant iff

u (t) = 0

Proof.

The derivative of

Φ_{1}

is given by:

\frac{d Φ_{1}}{d t} = - \frac{2 d a_{0} g^{2}}{λ} p_{W} (\frac{1}{N (t)} - \frac{1}{K}) .

Since

p_{W}

is constant and positive (Equations (A5) and (A7)), we conclude that

Φ_{1}

is strictly decreasing.

Optimal solutions are bang-bang; therefore, the control can either be 0 or 1. For

u (t) = 1

the derivative of

Φ_{2}

is given by:

\frac{d Φ_{2}}{d t} = α e^{- α U} > 0,

therefore,

Φ_{2}

is strictly increasing when

u = 1

. In the case where

u = 0

,

Φ_{2}

is constant, since both

U (t)

and

p_{U} (t)

are constant for

u (t) = 0

. □

Theorem A1.

The optimal control cannot switch from

u = 0

to

u = 1

.

Proof.

We define the time points

t_{0}, t_{1}, t_{2}

as

0 \leq t_{0} < t_{1} < t_{2} \leq T

. We assume that the optimal control switches from 0 to 1 at

t = t_{1}

; then

u (t) = 0

for

t \in I_{1} = (t_{0}, t_{1})

and

u (t) = 1

for

t \in I_{2} = [t_{1}, t_{2})

. Let

s \in I_{1}

.

From Remark A2,

Φ_{1}

is strictly decreasing, therefore,

Φ_{1} (s) > Φ_{1} (t_{1})

(A11)

Since

u (t) = 0

for

t \in I_{1}

,

Φ_{2}

is constant in

I_{1}

(Remark A2), therefore,

Φ_{2} (s) = Φ_{2} (t_{1}) .

(A12)

By combining the inequalities (A11) and (A12) and taking into account that

Φ (s) < 0

(since

u (s) = 0

) we get:

Φ_{1} (t_{1}) + Φ_{2} (t_{1}) < Φ_{1} (s) + Φ_{2} (t_{1}) = Φ_{1} (s) + Φ_{2} (s) < 0 .

(A13)

Inequality (A13) leads to a contradiction, since

t_{1}

is a switching point and, thus,

Φ (t_{1}) = Φ_{1} (t_{1}) + Φ_{2} (t_{1}) = 0

.

Consequently, the optimal control cannot switch from

u = 0

to

u = 1

. □

It is emphasised that there can only be three possible types of optimal solutions:

treat-and-stop, where drug is administered until some time point x and no drug is administered post x,
continuous MTD for the entire time horizon,
no treatment.

Any other types of optimal solution would require the control to switch from

u = 0

to

u = 1

, which is not possible.

Appendix E. NLP Problem for the Maximisation of the Severe Adverse Effect Payoff

In Appendix D, we demonstrated that the optimal solutions for the severe adverse effect payoff, with tumour evolution model described in Equation (36), are bang-bang and can only switch from

u = 1

to

u = 0

. Here, using the bang-bang structure of the optimal control, we reformulate the optimisation problem to focus on piecewise constant controls that switch between 1 and 0 (and vice versa) at p discrete time points,

t_{1}, \dots, t_{p}

, the switching times. This method can be applied to any optimal control problem with bang-bang solutions. The switching times are in non-decreasing order and consecutive switching times are allowed to coincide. The control is given by

u (t) = Σ_{j = 1}^{p} σ_{j} χ_{[t_{j - 1}, t_{j})} .

with

σ_{j} = \frac{1 - {(- 1)}^{j}}{2}

,

χ_{[t_{j - 1}, t_{j})} (t) = 1

if

t \in [t_{j - 1}, t_{j})

and

χ_{[t_{j - 1}, t_{j})} (t) = 0

otherwise. Any bang-bang control with at most p switches can be expressed by a set of switching times

{(t_{1}, \dots, t_{p})}^{tr}

. Controls with

m < p

switches can be constructed when there exist

p - m

couples of consecutive switching times that coincide (i.e.,

t_{i_{q}} = t_{i_{q} + 1}

for

i_{q} \in {1, \dots, p - 1}

with

q = 1, \dots, p - m

), since the control is fixed at the time-intervals

[t_{i_{q} - 1}, t_{i_{q + 1} + 1})

,

q \in {1, \dots, m}

.

To accurately evaluate optimal switching times, we employ the time scaling method used in [71]. Let

θ_{i}

,

i \in {1, \dots, p}

be positive numbers that satisfy

\sum_{i = 1}^{p} \frac{θ_{i}}{p} = T

. Then each

t \in [t_{q - 1}, t_{q})

,

q = 1, \dots, p

can be expressed as

t (s) = \sum_{l = 1}^{q - 1} \frac{θ_{l}}{p} + θ_{q} (s - \frac{q - 1}{p})

with

s \in [\frac{q - 1}{p}, \frac{q}{p})

. Notice that the switching times can be written as

t_{q} = \sum_{i = 1}^{q} \frac{θ_{i}}{p}

and that

t_{p} = T

.

We express the state variables and the payoff as functions of

\vec{θ} = (θ_{1}, \dots, θ_{p})

. Let

I_{q} = [t_{q - 1}, t_{q})

. Then the tumour size for every

t \in I_{q}

is given by:

N (t) = N (t_{q - 1}) \frac{e^{ρ_{q} (t - t_{q - 1})}}{1 + \frac{N (t_{q - 1})}{K} (e^{ρ_{q} (t - t_{q - 1})} - 1)}

with

ρ_{q} = g (1 - 2 d σ_{q})

and

t_{0} = 0

. The tumour size at each switching time

t_{q}

is:

N_{q} = N_{q - 1} \frac{e^{ρ_{q} θ_{q}}}{1 + \frac{N q - 1}{K} (e^{ρ_{q} θ q} - 1)},

q = 1, 2, \dots, p

with

N_{0} = N (0)

. Similarly the values of W at any switching time

t_{q}

is given by:

W_{q} = (\frac{a_{0}}{λ} g \frac{1}{ρ_{q}} (\frac{1}{K} - \frac{1}{N_{q - 1}}) (e^{- \frac{ρ_{q} θ_{q}}{p}} - 1)) + W_{q - 1} .

The TEP can be expressed as

p_{e} (\vec{θ}) = \frac{p (\vec{θ}) - p_{0}}{1 - p_{0}}

with

p (\vec{θ}) = e^{- {(\frac{a_{0}}{λ N_{p} (\vec{θ})} + W_{p} (\vec{θ}))}^{- 1}}

and

p_{0} = p ({\vec{θ}}_{0})

,

{\vec{θ}}_{0} = {(0, p T, 0, \dots, 0)}^{tr}

the lineage elimination probability in the absence of drugs.

For piecewise continuous controls, the payoff can be expressed as:

J (\vec{θ}) = e^{- α \sum_{q = 1}^{p} σ_{q} (t_{q} - t_{q - 1})} (L_{e} p_{e} (\vec{θ}) + L_{n} (1 - p_{e} (\vec{θ}))) + \sum_{q = 1}^{p} r_{q} (\vec{θ})

with

r_{q} (\vec{θ}) = σ_{q} \int_{t_{q - 1}}^{t_{q}} α t e^{- α σ_{q} (t - t_{q - 1})} d t .

For

σ_{q} = 0

r_{q} (\vec{θ}) = 0

and for

σ_{q} = 1

\begin{matrix} r_{q} (\vec{θ}) = & \int_{t_{q - 1}}^{t_{q}} α t e^{- α (t - t_{q - 1})} d t = \\ = & t_{q - 1} - t_{q} e^{- α \frac{θ_{q}}{p}} + \int_{t_{q - 1}}^{t_{q}} e^{- α (t - t_{q - 1})} d t = \\ = & - (t_{q} + \frac{1}{α}) e^{- α \frac{θ_{q}}{p}} + t_{q - 1} + \frac{1}{α}, \end{matrix}

giving

r_{q} = σ_{q} (- (t_{q} + \frac{1}{α}) e^{- α \frac{θ_{q}}{p}} + t_{q - 1} + \frac{1}{α}), q = 1, \dots, p .

The payoff can be expressed as:

J (\vec{θ}) = e^{- α \sum_{q = 1}^{p} σ_{q} \frac{θ_{q}}{p}} (L_{e} p_{e} (\vec{θ}) + L_{n} (1 - p_{e} (\vec{θ}))) + \sum_{q = 1}^{p} σ_{q} (- (t_{q} + \frac{1}{α}) e^{- α \frac{θ_{q}}{p}} + t_{q - 1} + \frac{1}{α})

where

t_{q} = \sum_{i = 1}^{q} \frac{θ_{q}}{p}

and

t_{0} = 0

.

For a given

p > 0

we maximise

J (\vec{θ})

, with

\vec{θ} \in R_{+}^{p}

, subject to the constraint

\sum \frac{θ_{i}}{p} = T

, thus, transforming the optimal control problem into a NLP. Because optimal solutions are bang-bang with at most a single switch (switching from

u = 1

to

u = 0

), for the computations in Section 5.1 we used

p = 2

.

Appendix F. Influence of the Patient’s Age on Optimal Drug Regimens

The patient’s age is represented by

L_{e}

, which denotes the expected lifetime if the tumour is eradicated and the patient survives therapy with probability 1. Figure A1 illustrates the dependence of the optimal drug administration period, x, on the drug tolerance interval

τ

and

L_{e}

, for initial tumour sizes

N_{0} = 10^{4}

and

N_{0} = 10^{8}

. The value of x does not depend significantly on

L_{e}

and is primarily influenced by

N_{0}

and

τ

. However, the values of

τ

that distinguish treatable patients from untreatable patients, decreases slightly as

L_{e}

increases. This occurs because the potential gain in expected lifetime, in the case of a cure, becomes larger as

L_{e}

increases (and

L_{n}

remains constant). Consequently, at higher values of

L_{e}

, the drug can be administered at lower survival probabilities compared to smaller values of

L_{e}

.

Figure A1. Optimal drug administration interval as a function of the drug tolerance interval $τ$ and the expected lifetime of a cured individual (A) Initial tumour size

N_{0} = 10^{4}

, (B) initial tumour size

N_{0} = 10^{8}

. The white region relates to patients in the untreatable category. Parameter values are as in Figure 8.

Figure A1. Optimal drug administration interval as a function of the drug tolerance interval $τ$ and the expected lifetime of a cured individual (A) Initial tumour size

N_{0} = 10^{4}

, (B) initial tumour size

N_{0} = 10^{8}

. The white region relates to patients in the untreatable category. Parameter values are as in Figure 8.

References

Schattler, H.; Ledzewicz, U. Optimal Control for Mathematical Models of Cancer Therapies An Application of Geometric Methods; Lecture Notes in Biomathematics; Springer: Berlin/Heidelberg, Germany, 2015; Volume 42. [Google Scholar]
Swan, G.W. Optimization of Human Cancer Radiotherapy; Interdisciplinary Applied Mathematics; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Martin, R.; Te, K.L. Optimal Control of Drug Administration in Cancer Chemotherapy; World Scientific Publishing Company Pte. Ltd.: Singapore, 1994. [Google Scholar] [CrossRef]
Kimmel, M.; Swierniak, A. Control Theory Approach to Cancer Chemotherapy: Benefiting from Phase Dependence. Tutorials in Mathematical Biosciences III: Cell Cycle, Proliferation, and Cancer; Springer: Berlin/Heidelberg, Germany, 2005; pp. 185–221. [Google Scholar]
Engelhart, M.; Lebiedz, D.; Sager, S. Optimal control for selected cancer chemotherapy ODE models: A view on the potential of optimal schedules and choice of objective function. Math. Biosci. 2011, 229, 123–134. [Google Scholar] [CrossRef] [PubMed]
Shi, J.; Alagoz, O.; Erenay, F.S.; Su, Q. A survey of optimization models on cancer chemotherapy treatment planning. Ann. Oper. Res. 2014, 221, 331–356. [Google Scholar] [CrossRef]
Rojas, C.; Belmonte-Beitia, J. Optimal control problems for differential equations applied to tumor growth: State of the art. Appl. Math. Nonlinear Sci. 2018, 3, 375–402. [Google Scholar] [CrossRef]
Kuznetsov, M.; Clairambault, J.; Volpert, V. Improving cancer treatments via dynamical biophysical models. Phys. Life Rev. 2021, 39, 1–48. [Google Scholar] [CrossRef] [PubMed]
Jarrett, A.M.; Faghihi, D.; Hormuth, D.A.; Lima, E.A.; Virostko, J.; Biros, G.; Patt, D.; Yankeelov, T.E. Optimal control theory for personalized therapeutic regimens in oncology: Background, history, challenges, and opportunities. J. Clin. Med. 2020, 9, 1314. [Google Scholar] [CrossRef]
Guzev, E.; Jadhav, S.S.; Hezkiy, E.E.; Sherman, M.Y.; Firer, M.A.; Bunimovich-Mendrazitsky, S. Validation of a Mathematical Model Describing the Dynamics of Chemotherapy for Chronic Lymphocytic Leukemia In Vivo. Cells 2022, 11, 2325. [Google Scholar] [CrossRef]
Hoffmann, H.; Thiede, C.; Glauche, I.; Bornhaeuser, M.; Roeder, I. Differential response to cytotoxic therapy explains treatment dynamics of acute myeloid leukaemia patients: Insights from a mathematical modelling approach. J. R. Soc. Interface 2020, 17, 20200091. [Google Scholar] [CrossRef]
Zhu, C.; Stiehl, T. Modelling post-chemotherapy stem cell dynamics in the bone marrow niche of AML patients. Sci. Rep. 2024, 14, 25060. [Google Scholar] [CrossRef]
Yin, A.; Moes, D.J.A.; van Hasselt, J.G.; Swen, J.J.; Guchelaar, H.J. A review of mathematical models for tumor dynamics and treatment resistance evolution of solid tumors. CPT Pharmacomet. Syst. Pharmacol. 2019, 8, 720–737. [Google Scholar] [CrossRef]
Ledzewicz, U.; Schättler, H. On the Role of the Objective in the Optimization of Compartmental Models for Biomedical Therapies. J. Optim. Theory Appl. 2020, 187, 305–335. [Google Scholar] [CrossRef]
Katta, B.; Vijayakumar, C.; Dutta, S.; Dubashi, B.; Ramakrishnaiah, V.P.N. The incidence and severity of patient-reported side effects of chemotherapy in routine clinical care: A prospective observational study. Cureus 2023, 15, e38301. [Google Scholar] [CrossRef] [PubMed]
Emran, T.B.; Shahriar, A.; Mahmud, A.R.; Rahman, T.; Abir, M.H.; Siddiquee, M.F.R.; Ahmed, H.; Rahman, N.; Nainu, F.; Wahyudin, E.; et al. Multidrug resistance in cancer: Understanding molecular mechanisms, immunoprevention and therapeutic approaches. Front. Oncol. 2022, 12, 891652. [Google Scholar] [CrossRef] [PubMed]
Shi, X.; Wang, X.; Yao, W.; Shi, D.; Shao, X.; Lu, Z.; Chai, Y.; Song, J.; Tang, W.; Wang, X. Mechanism insights and therapeutic intervention of tumor metastasis: Latest developments and perspectives. Signal Transduct. Target. Ther. 2024, 9, 192. [Google Scholar] [CrossRef]
Vasan, N.; Baselga, J.; Hyman, D.M. A view on drug resistance in cancer. Nature 2019, 575, 299–309. [Google Scholar] [CrossRef]
Mani, K.; Deng, D.; Lin, C.; Wang, M.; Hsu, M.L.; Zaorsky, N.G. Causes of death among people living with metastatic cancer. Nat. Commun. 2024, 15, 1519. [Google Scholar] [CrossRef]
Burroughs, N.J.; Leuridan, M.L.C. Optimising the tumour elimination payoff in cancer therapy. IET Control Theory Appl. 2024, 18, 1621–1637. [Google Scholar] [CrossRef]
Pontryagin, L. Mathematical Theory of Optimal Processes; Routledge: Abingdon, UK, 1987. [Google Scholar]
Martin, R.B.; Fisher, M.E.; Minchin, R.F.; Teo, K.L. Optimal control of tumor size used to maximize survival time when cells are resistant to chemotherapy. Math. Biosci. 1992, 110, 201–219. [Google Scholar] [CrossRef]
Bratus, A.; Samokhin, I.; Yegorov, I.; Yurchenko, D. Maximization of viability time in a mathematical model of cancer therapy. Math. Biosci. 2017, 294, 110–119. [Google Scholar] [CrossRef]
Murphy, S.A. Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. Stat. Methodol. 2003, 65, 331–355. [Google Scholar] [CrossRef]
Huang, X.; Ning, J.; Wahed, A.S. Optimization of individualized dynamic treatment regimes for recurrent diseases. Stat. Med. 2014, 33, 2363–2378. [Google Scholar] [CrossRef] [PubMed]
Lewandowska, A.; Rudzki, G.; Lewandowski, T.; Próchnicki, M.; Rudzki, S.; Laskowska, B.; Brudniak, J. Quality of Life of Cancer Patients Treated with Chemotherapy. Int. J. Environ. Res. Public Health 2020, 17, 6938. [Google Scholar] [CrossRef]
Zaider, M.; Minerbo, G.N. Tumour control probability: A formulation applicable to any temporal protocol of dose delivery. Phys. Med. Biol. 2000, 45, 279–293. [Google Scholar] [CrossRef] [PubMed]
Moylan, P.J.; Moore, J.B. Generalizations of Singular Optimal Control Theory. Automatica 1971, 7, 591–598. [Google Scholar] [CrossRef]
Chachuat, B. Nonlinear and Dynamic Optimization: From Theory to Practice; EPFL Scientific Publications: Lausanne, Switzerland, 2007. [Google Scholar]
Petersen, D.W.; Zalkind, J.H. A review of direct sufficient conditions in optimal control theory. Int. J. Control 1978, 28, 589–610. [Google Scholar] [CrossRef]
Benzekry, S.; Lamont, C.; Beheshti, A.; Tracz, A.; Ebos, J.M.; Hlatky, L.; Hahnfeldt, P. Classical mathematical models for description and prediction of experimental tumor growth. PLoS Comput. Biol. 2014, 10, e1003800. [Google Scholar] [CrossRef] [PubMed]
Loeffen, E.A.; Knops, R.R.; Boerhof, J.; Feijen, E.L.; Merks, J.H.; Reedijk, A.M.; Lieverst, J.A.; Pieters, R.; Boezen, H.M.; Kremer, L.C.; et al. Treatment-related mortality in children with cancer: Prevalence and risk factors. Eur. J. Cancer 2019, 121, 113–122. [Google Scholar] [CrossRef]
CTE Program. Common Terminology Criteria for Adverse Events (CTCAE) v5.0. 2017. Available online: https://ctep.cancer.gov/ (accessed on 1 June 2024).
Delobel, T.; Ayala-Hernández, L.E.; Bosque, J.J.; Pérez-Beteta, J.; Chulián, S.; García-Ferrer, M.; Piñero, P.; Schucht, P.; Murek, M.; Pérez-García, V.M. Overcoming chemotherapy resistance in low-grade gliomas: A computational approach. PLoS Comput. Biol. 2023, 19, e1011208. [Google Scholar] [CrossRef]
Howard, G.R.; Jost, T.A.; Yankeelov, T.E.; Brock, A. Quantification of long-term doxorubicin response dynamics in breast cancer cell lines to direct treatment schedules. PLoS Comput. Biol. 2022, 18, e1009104. [Google Scholar] [CrossRef]
Kim, E.; Brown, J.S.; Eroglu, Z.; Anderson, A.R. Adaptive therapy for metastatic melanoma: Predictions from patient calibrated mathematical models. Cancers 2021, 13, 823. [Google Scholar] [CrossRef]
Walter, R.B.; Othus, M.; Borthakur, G.; Ravandi, F.; Cortes, J.E.; Pierce, S.A.; Appelbaum, F.R.; Kantarjian, H.A.; Estey, E.H. Prediction of early death after induction therapy for newly diagnosed acute myeloid leukemia with pretreatment risk scores: A novel paradigm for treatment assignment. J. Clin. Oncol. 2011, 29, 4417–4424. [Google Scholar] [CrossRef]
Ray-Coquard, I.; Ghesquiere, H.; Bachelot, T.; Borg, C.; Biron, P.; Sebban, C.; LeCesne, A.; Chauvin, F.; Blay, J. Identification of patients at risk for early death after conventional chemotherapy in solid tumours and lymphomas. Br. J. Cancer 2001, 85, 816–822. [Google Scholar] [CrossRef]
Peitz, S.; Dellnitz, M. A survey of recent trends in multiobjective optimal control—Surrogate models, feedback control and objective reduction. Math. Comput. Appl. 2018, 23, 30. [Google Scholar] [CrossRef]
Abdulrashid, I.; Delen, D.; Usman, B.; Uzochukwu, M.I.; Ahmed, I. A multi-objective optimization framework for determining optimal chemotherapy dosing and treatment duration. Healthc. Anal. 2024, 5, 100335. [Google Scholar] [CrossRef]
Wüthrich, D.; Zeverino, M.; Bourhis, J.; Bochud, F.; Moeckli, R. Influence of optimisation parameters on directly deliverable Pareto fronts explored for prostate cancer. Phys. Medica 2023, 114, 103139. [Google Scholar] [CrossRef] [PubMed]
Wyatt, G.; Sikorskii, A.; Tesnjak, I.; Victorson, D.; Srkalovic, G. Chemotherapy interruptions in relation to symptom severity in advanced breast cancer. Support. Care Cancer 2015, 23, 3183–3191. [Google Scholar] [CrossRef] [PubMed]
Palmieri, L.J.; Dubreuil, O.; Bachet, J.B.; Trouilloud, I.; Locher, C.; Coriat, R.; Moryoussef, F.; Landi, B.; Perkins, G.; Hautefeuille, V.; et al. Reasons for chemotherapy discontinuation and end-of-life in patients with gastrointestinal cancer: A multicenter prospective AGEO study. Clin. Res. Hepatol. Gastroenterol. 2021, 45, 101431. [Google Scholar] [CrossRef] [PubMed]
Shen, X.; Yang, J.; Qian, G.; Sheng, M.; Wang, Y.; Li, G.; Yan, J. Treatment-related adverse events of immune checkpoint inhibitors in clinical trials: A systematic review and meta-analysis. Front. Oncol. 2024, 14, 1391724. [Google Scholar] [CrossRef]
Ledzewicz, U.; Schättler, H. Optimal bang-bang controls for a two-compartment model in cancer chemotherapy. J. Optim. Theory Appl. 2002, 114, 609–637. [Google Scholar] [CrossRef]
Almeida, L.; Bagnerini, P.; Fabrini, G.; Hughes, B.D.; Lorenzi, T. Evolution of cancer cell populations under cytotoxic therapy and treatment optimisation: Insight from a phenotype-structured model. ESAIM Math. Model. Numer. Anal. 2019, 53, 1157–1190. [Google Scholar] [CrossRef]
Martin, R.B. Optimal control drug scheduling of cancer chemotherapy. Automatica 1992, 28, 1113–1123. [Google Scholar] [CrossRef]
McDonald, T.O.; Cheng, Y.C.; Graser, C.; Nicol, P.B.; Temko, D.; Michor, F. Computational approaches to modelling and optimizing cancer treatment. Nat. Rev. Bioeng. 2023, 1, 695–711. [Google Scholar] [CrossRef]
Rao, A.V. A Survey of Numerical Methods for Optimal Control. Adv. Astronaut. Sci. 2009, 135, 497–528. [Google Scholar]
Caillau, J.-B.; Ferretti, R.; Trélat, E.; Zidani, H. An Algorithmic Guide for Finite-Dimensional Optimal Control Problems. In Handbook of Numerical Analysis; Elsevier: Amsterdam, The Netherlands, 2023; Volume 24, pp. 559–626. [Google Scholar]
Andersson, J.A.E.; Gillis, J.; Horn, G.; Rawlings, J.B.; Diehl, M. CasADi: A Software Framework for Nonlinear Optimization and Optimal Control. Math. Program. Comput. 2019, 11, 1–36. [Google Scholar] [CrossRef]
Wächter, A.; Biegler, L.T. On the Implementation of an Interior-Point Filter Line-Search Algorithm for Large-Scale Nonlinear Programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
Hall, P.S.; Swinson, D.; Cairns, D.A.; Waters, J.S.; Petty, R.; Allmark, C.; Ruddock, S.; Falk, S.; Wadsley, J.; Roy, R.; et al. Efficacy of Reduced-Intensity Chemotherapy with Oxaliplatin and Capecitabine on Quality of Life and Cancer Control among Older and Frail Patients with Advanced Gastroesophageal Cancer: The GO2 Phase 3 Randomized Clinical Trial. JAMA Oncol. 2021, 7, 869–877. [Google Scholar] [CrossRef] [PubMed]
Mohamed, M.R.; Rich, D.Q.; Seplaki, C.; Lund, J.L.; Flannery, M.; Culakova, E.; Magnuson, A.; Wells, M.; Tylock, R.; Mohile, S.G. Primary Treatment Modification and Treatment Tolerability among Older Chemotherapy Recipients with Advanced Cancer. JAMA Netw. Open 2024, 7, e2356106. [Google Scholar] [CrossRef]
Lazarevic, V.L. Acute Myeloid Leukaemia in Patients We Judge as Being Older and/or Unfit. J. Intern. Med. 2021, 290, 279–293. [Google Scholar] [CrossRef]
Diekmann, B.; Timmerman, M.; Hempenius, L.; van Roon, E.; Franken, B.; Hoogendoorn, M. New Treatment Opportunities for Older Patients with Acute Myeloid Leukemia and the Increasing Importance of Frailty Assessment; An Oncogeriatric Perspective. J. Geriatr. Oncol. 2024, 15, 101631. [Google Scholar] [CrossRef]
Mariotto, A.B.; Zou, Z.; Johnson, C.J.; Scoppa, S.; Weir, H.K.; Huang, B. Geographical, Racial and Socio-Economic Variation in Life Expectancy in the US and Their Impact on Cancer Relative Survival. PLoS ONE 2018, 13, e0201034. [Google Scholar] [CrossRef]
Van Hemelrijck, M.; Ventimiglia, E.; Robinson, D.; Gedeborg, R.; Holmberg, L.; Stattin, P.; Garmo, H. Population-Based Estimates of Age- and Comorbidity-Specific Life Expectancy: A First Application in Swedish Males. BMC Med. Inform. Decis. Mak. 2022, 22, 35. [Google Scholar] [CrossRef]
Zaorsky, N.G.; Churilla, T.M.; Egleston, B.L.; Fisher, S.G.; Ridge, J.A.; Horwitz, E.M.; Meyer, J.E. Causes of Death among Cancer Patients. Ann. Oncol. 2017, 28, 400–407. [Google Scholar] [CrossRef]
Wu, G.; Wu, J.; Wang, B.; Zhu, X.; Shi, X.; Ding, Y. Importance of Tumor Size at Diagnosis as a Prognostic Factor for Hepatocellular Carcinoma Survival: A Population-Based Study. Cancer Manag. Res. 2018, 10, 4401–4410. [Google Scholar] [CrossRef] [PubMed]
Wangchinda, P.; Ithimakin, S. Factors That Predict Recurrence Later Than 5 Years After Initial Treatment in Operable Breast Cancer. World J. Surg. Oncol. 2016, 14, 223. [Google Scholar] [CrossRef] [PubMed]
Giacchetti, S.; Hamy, A.-S.; Delaloge, S.; Brain, E.; Berger, F.; Sigal-Zafrani, B.; Mathieu, M.-C.; Bertheau, P.; Guinebretière, J.M.; Saghatchian, M.; et al. Long-Term Outcome of the REMAGUS 02 Trial, a Multicenter Randomised Phase II Trial in Locally Advanced Breast Cancer Patients Treated with Neoadjuvant Chemotherapy with or without Celecoxib or Trastuzumab According to HER2 Status. Eur. J. Cancer 2017, 75, 323–332. [Google Scholar] [CrossRef] [PubMed]
Morgensztern, D.; Waqar, S.; Subramanian, J.; Gao, F.; Trinkaus, K.; Govindan, R. Prognostic Significance of Tumor Size in Patients with Stage III Non-Small-Cell Lung Cancer: A Surveillance, Epidemiology, and End Results (SEER) Survey from 1998 to 2003. J. Thorac. Oncol. 2012, 7, 1479–1484. [Google Scholar] [CrossRef]
Zhang, J.; Gold, K.A.; Lin, H.Y.; Swisher, S.G.; Xing, Y.; Lee, J.J.; Kim, E.S.; William, W.N., Jr. Relationship between Tumor Size and Survival in Non-Small-Cell Lung Cancer (NSCLC): An Analysis of the Surveillance, Epidemiology, and End Results (SEER) Registry. J. Thorac. Oncol. 2015, 10, 682–690. [Google Scholar] [CrossRef]
Vitale, I.; Shema, E.; Loi, S.; Galluzzi, L. Intratumoral Heterogeneity in Cancer Progression and Response to Immunotherapy. Nat. Med. 2021, 27, 212–224. [Google Scholar] [CrossRef]
Colijn, C.; Mackey, M.C. A Mathematical Model of Hematopoiesis—I. Periodic Chronic Myelogenous Leukemia. J. Theor. Biol. 2005, 237, 117–132. [Google Scholar] [CrossRef]
Michor, F.; Hughes, T.P.; Iwasa, Y.; Branford, S.; Shah, N.P.; Sawyers, C.L.; Nowak, M.A. Dynamics of Chronic Myeloid Leukaemia. Nature 2005, 435, 1267–1270. [Google Scholar] [CrossRef] [PubMed]
Roeder, I.; Herberg, M.; Horn, M. An “age” structured model of hematopoietic stem cell organization with application to chronic myeloid leukemia. Bull. Math. Biol. 2009, 71, 602–626. [Google Scholar] [CrossRef]
Maass, K.; Kim, M. A Markov Decision Process Approach to Optimizing Cancer Therapy Using Multiple Modalities. Math. Med. Biol. 2020, 37, 22–39. [Google Scholar] [CrossRef] [PubMed]
Robbins, H.M. A generalized Legendre-Clebsch condition for the singular cases of optimal control. IBM J. Res. Dev. 1967, 11, 361–372. [Google Scholar] [CrossRef]
Loxton, R.C.; Teo, K.L.; Rehbock, V. Optimal control problems with multiple characteristic time points in the objective and constraints. Automatica 2008, 44, 2923–2929. [Google Scholar] [CrossRef]

Figure 1. Schematic showing key events during treatment and associated patient future and outcome. A patient with a treatable tumour starts treatment at detection over a treatment horizon (yellow), resulting in shown outcome. If relapse occurs, treatment may resume. Severe adverse effects (SAE) can range from cessation of treatment to death (treatment related mortality, TRM).

Figure 2. Optimal trajectories in $(N, s)$ phase space. Sketch of phase plane showing the switching curve

Φ = 0

(black) where u switches from

u = 1

below, to

u = 0

above. Optimal trajectories start on

N = N_{0}

and terminate on

s = 0

. The s-nullcline is shown in region

u = 1

and the flow (blue). The critical level curve

Γ

, with

H = 0

, is shown (magenta) which passes through

(N_{Φ}, 0)

. Trajectories are shown in cyan: no treatment optimal trajectory (on

s = 0

), the MTD below

Γ

, and a trajectory that fails to terminate in finite time (between

Γ

and

s = 0

). Sketch based on logistic growth model with a cytotoxic drug.

Figure 2. Optimal trajectories in $(N, s)$ phase space. Sketch of phase plane showing the switching curve

Φ = 0

(black) where u switches from

u = 1

below, to

u = 0

above. Optimal trajectories start on

N = N_{0}

and terminate on

s = 0

. The s-nullcline is shown in region

u = 1

and the flow (blue). The critical level curve

Γ

, with

H = 0

, is shown (magenta) which passes through

(N_{Φ}, 0)

. Trajectories are shown in cyan: no treatment optimal trajectory (on

s = 0

), the MTD below

Γ

, and a trajectory that fails to terminate in finite time (between

Γ

and

s = 0

). Sketch based on logistic growth model with a cytotoxic drug.

Figure 3. Optimal solutions for different values of the time horizon T. No treatment, continuous MTD, and treat-and-stop optimal solutions are shown in blue, green and purple, respectively. (A) The payoff as a function of T; the black line represents continuous MTD for

t \in [0, T]

. (B) Optimal drug infusion period as a function of the time horizon; the red dashed line shows a drug infusion period of T. (C) TEP as a function of the time horizon; the red dashed line represents TEP when administrating the MTD for

t \in [0, T]

.

T_{0}

is the shortest time horizon where drug administration becomes optimal, whereas

T_{1}

is the longest time horizon where the optimal solution prescribes the MTD for all

t \in [0, T]

. The expected lifetime of a healthy and untreated individual are

L_{e} = 40

and

L_{n} = 3

years, respectively. Drug toxicity is

γ = 30

and the drug has maximum efficacy (

d = 1

). The per-capita growth rate of tumour cells is

g = 0.502

, the carrying capacity is

K = 1297 \cdot 10^{6}

. The initial tumour size is

N_{0} = 10^{8}

cells. In the absence of drugs, the birth rate of the branching process is

a_{0} = g

and the death rate is 0.

Figure 3. Optimal solutions for different values of the time horizon T. No treatment, continuous MTD, and treat-and-stop optimal solutions are shown in blue, green and purple, respectively. (A) The payoff as a function of T; the black line represents continuous MTD for

t \in [0, T]

. (B) Optimal drug infusion period as a function of the time horizon; the red dashed line shows a drug infusion period of T. (C) TEP as a function of the time horizon; the red dashed line represents TEP when administrating the MTD for

t \in [0, T]

.

T_{0}

is the shortest time horizon where drug administration becomes optimal, whereas

T_{1}

is the longest time horizon where the optimal solution prescribes the MTD for all

t \in [0, T]

. The expected lifetime of a healthy and untreated individual are

L_{e} = 40

and

L_{n} = 3

years, respectively. Drug toxicity is

γ = 30

and the drug has maximum efficacy (

d = 1

). The per-capita growth rate of tumour cells is

g = 0.502

, the carrying capacity is

K = 1297 \cdot 10^{6}

. The initial tumour size is

N_{0} = 10^{8}

cells. In the absence of drugs, the birth rate of the branching process is

a_{0} = g

and the death rate is 0.

Figure 4. The payoff for the MTD regimen with varying drug infusion period. (A)

T = 33.5 < T_{0}

, (B)

T_{0} < T = 46.2 < T_{1}

, (C)

T_{2} < T = 68

and (D) Composite of examples presented in (A) (blue), in (B) (green), and in (C) (purple). The horizontal blue, green, and purple lines indicate the payoff of no treatment, continuous MTD, and treat-and-stop solutions, respectively.Vertical lines show times

T_{0}

(the shortest time horizon where drug administration is optimal) and

T_{1}

(longest time horizon where the optimal solution prescribes the MTD for all

t \in [0, T]

). Blue, green, and purple dots show the optimal drug administration periods for

T = 33.5

days,

T = 46.2

days, and

T = 68

days, respectively. Parameter values are as in Figure 3.

Figure 4. The payoff for the MTD regimen with varying drug infusion period. (A)

T = 33.5 < T_{0}

, (B)

T_{0} < T = 46.2 < T_{1}

, (C)

T_{2} < T = 68

and (D) Composite of examples presented in (A) (blue), in (B) (green), and in (C) (purple). The horizontal blue, green, and purple lines indicate the payoff of no treatment, continuous MTD, and treat-and-stop solutions, respectively.Vertical lines show times

T_{0}

(the shortest time horizon where drug administration is optimal) and

T_{1}

(longest time horizon where the optimal solution prescribes the MTD for all

t \in [0, T]

). Blue, green, and purple dots show the optimal drug administration periods for

T = 33.5

days,

T = 46.2

days, and

T = 68

days, respectively. Parameter values are as in Figure 3.

Figure 5. Optimal infusion period and treatment efficacy as a function of drug toxicity $γ$ and expected patient lifespan under no treatment $L_{n}$ . Two cases are shown,

L_{e} = 2

years, top row, for instance, a patient in their 80s, and

L_{e} = 20

years, bottom row, for instance, a patient in their 60s. (A,D) The optimal drug infusion period. White region in top right of panels correspond to where no treatment is optimal. (B,E) The (discounted) gain in life expectancy after optimal treatment. (C,F) The

TEP

after optimal treatment. The time horizon is

T = 60

days, exceeding the drug infusion period in these simulations. Parameters as in Figure 3.

Figure 5. Optimal infusion period and treatment efficacy as a function of drug toxicity $γ$ and expected patient lifespan under no treatment $L_{n}$ . Two cases are shown,

L_{e} = 2

years, top row, for instance, a patient in their 80s, and

L_{e} = 20

years, bottom row, for instance, a patient in their 60s. (A,D) The optimal drug infusion period. White region in top right of panels correspond to where no treatment is optimal. (B,E) The (discounted) gain in life expectancy after optimal treatment. (C,F) The

TEP

after optimal treatment. The time horizon is

T = 60

days, exceeding the drug infusion period in these simulations. Parameters as in Figure 3.

Figure 6. Optimal infusion period x and payoff as a function of cancer-free individual lifespan $L_{e}$ and the initial tumour size $N_{0}$ . (A) The optimal drug infusion period x, and (B) the relative (discounted) life expectancy under optimal treatment (pay-off relative to the life expectancy of a cancer-free individual). Parameters as in Figure 3.

γ = 1, L_{n} = 1 / 12

years. The time horizon is

T = 60

days, exceeding the drug infusion period.

Figure 6. Optimal infusion period x and payoff as a function of cancer-free individual lifespan $L_{e}$ and the initial tumour size $N_{0}$ . (A) The optimal drug infusion period x, and (B) the relative (discounted) life expectancy under optimal treatment (pay-off relative to the life expectancy of a cancer-free individual). Parameters as in Figure 3.

γ = 1, L_{n} = 1 / 12

years. The time horizon is

T = 60

days, exceeding the drug infusion period.

Figure 7. Optimal solutions for different values of the time horizon T for the TRM logistic model. No treatment, continuous MTD, and treat-and-stop optimal solutions are shown in blue, green, and purple, respectively. (A) The payoff as a function of T; the black line represents continuous MTD for

t \in [0, T]

. (B) Optimal drug infusion period as a function of the time horizon; the red dashed line shows a drug infusion period of T. (C) TEP as a function of the time horizon; the red dashed line represents TEP when administrating the MTD for

t \in [0, T]

.

T_{0}

is the shortest time horizon where drug administration becomes optimal, whereas

T_{1}

is the longest time horizon where the optimal solution prescribes the MTD for all

t \in [0, T]

. The expected lifetime of a healthy and untreated individual are

L_{e} = 40

years and

L_{n} = 2

months, respectively. The patient drug tolerance interval is

τ = 30

days and the drug has maximum efficacy (

d = 1

). The per-capita growth rate of tumour cells is

g = 0.502

, the carrying capacity is

K = 1297 \cdot 10^{6}

. The initial tumour size is

N_{0} = 10^{8}

cells. In the absence of drugs, the birth rate of the branching process is

a_{0} = g

and the death rate is 0.

Figure 7. Optimal solutions for different values of the time horizon T for the TRM logistic model. No treatment, continuous MTD, and treat-and-stop optimal solutions are shown in blue, green, and purple, respectively. (A) The payoff as a function of T; the black line represents continuous MTD for

t \in [0, T]

. (B) Optimal drug infusion period as a function of the time horizon; the red dashed line shows a drug infusion period of T. (C) TEP as a function of the time horizon; the red dashed line represents TEP when administrating the MTD for

t \in [0, T]

.

T_{0}

is the shortest time horizon where drug administration becomes optimal, whereas

T_{1}

is the longest time horizon where the optimal solution prescribes the MTD for all

t \in [0, T]

. The expected lifetime of a healthy and untreated individual are

L_{e} = 40

years and

L_{n} = 2

months, respectively. The patient drug tolerance interval is

τ = 30

days and the drug has maximum efficacy (

d = 1

). The per-capita growth rate of tumour cells is

g = 0.502

, the carrying capacity is

K = 1297 \cdot 10^{6}

. The initial tumour size is

N_{0} = 10^{8}

cells. In the absence of drugs, the birth rate of the branching process is

a_{0} = g

and the death rate is 0.

Figure 8. Cure and survival for TRM logistic model. (A) Payoff as a function of patient drug tolerance interval

τ

, (B) cure probability (the product of the TEP and the probability of surviving therapy) as a function of

τ

, (C) MTD optimal drug administration interval, x, as a function of

τ

. Grey dashed line is the minimum drug administration period found for optimal solutions. (D) probability of surviving the therapy as a function of

τ

. The initial tumour size is

N_{0} = 10^{8}

cells,

L_{e} = 40

years and

L_{n} = 2

months. Parameter values are:

g = 0.502

cells/days,

d = 1

,

K = 1297 \cdot 10^{6}

cells. The time horizon is

T = 360

days.

Figure 8. Cure and survival for TRM logistic model. (A) Payoff as a function of patient drug tolerance interval

τ

, (B) cure probability (the product of the TEP and the probability of surviving therapy) as a function of

τ

, (C) MTD optimal drug administration interval, x, as a function of

τ

. Grey dashed line is the minimum drug administration period found for optimal solutions. (D) probability of surviving the therapy as a function of

τ

. The initial tumour size is

N_{0} = 10^{8}

cells,

L_{e} = 40

years and

L_{n} = 2

months. Parameter values are:

g = 0.502

cells/days,

d = 1

,

K = 1297 \cdot 10^{6}

cells. The time horizon is

T = 360

days.

Figure 9. TEP of optimal drug administration regiments as a function of the drug tolerance interval $τ$ . The TEP under optimal therapy distinguished by a drug administration period

x > τ

, red, and

x \leq τ

. The solid black line shows the TEP for MTD administrated continuously for

t \in [0, τ]

, i.e., drug administration stops after

τ

days (so this protocol is not generically optimal). The vertical dashed line shows

τ_{c r i t}

. The initial tumour size is

N_{0} = 10^{8}

cells,

L_{e} = 40

years and

L_{n} = 2

months. Parameter values are as Figure 8.

Figure 9. TEP of optimal drug administration regiments as a function of the drug tolerance interval $τ$ . The TEP under optimal therapy distinguished by a drug administration period

x > τ

, red, and

x \leq τ

. The solid black line shows the TEP for MTD administrated continuously for

t \in [0, τ]

, i.e., drug administration stops after

τ

days (so this protocol is not generically optimal). The vertical dashed line shows

τ_{c r i t}

. The initial tumour size is

N_{0} = 10^{8}

cells,

L_{e} = 40

years and

L_{n} = 2

months. Parameter values are as Figure 8.

Figure 10. Payoff as a function of the drug infusion period x. Cases (A)

τ = 1.88

days, (B)

τ = 6

days, (C)

τ = 7.1

days and (D)

τ = 7.2

days. Continuous black lines show the MTD drug regiment for administration period x, red dots correspond to optimal solutions, the red continuous line indicates the maximum payoff whereas the green dashed line indicates the value of the payoff when no drug is administered. The initial tumour size is

N_{0} = 10^{8}

cells,

L_{e} = 40

years and

L_{n} = 2

months (60 days) and the parameter values are as in Figure 8.

Figure 10. Payoff as a function of the drug infusion period x. Cases (A)

τ = 1.88

days, (B)

τ = 6

days, (C)

τ = 7.1

days and (D)

τ = 7.2

days. Continuous black lines show the MTD drug regiment for administration period x, red dots correspond to optimal solutions, the red continuous line indicates the maximum payoff whereas the green dashed line indicates the value of the payoff when no drug is administered. The initial tumour size is

N_{0} = 10^{8}

cells,

L_{e} = 40

years and

L_{n} = 2

months (60 days) and the parameter values are as in Figure 8.

Figure 11. Optimal solutions for the TRM logistic model against initial tumour size and drug tolerance interval $τ$ . (A) The optimal drug administration period (x). (B) The relative life expectancy after optimal treatment (

L / L_{e}

), relative to the life expectancy of a healthy individual. (C) The TEP. (D) The cure probability (tumour elimination and survival). The lifespan of healthy patients is

L_{e} = 40

years, the lifespan of an untreated patient is

L_{n} = 2

months and parameter values are as in Figure 8.

Figure 11. Optimal solutions for the TRM logistic model against initial tumour size and drug tolerance interval $τ$ . (A) The optimal drug administration period (x). (B) The relative life expectancy after optimal treatment (

L / L_{e}

), relative to the life expectancy of a healthy individual. (C) The TEP. (D) The cure probability (tumour elimination and survival). The lifespan of healthy patients is

L_{e} = 40

years, the lifespan of an untreated patient is

L_{n} = 2

months and parameter values are as in Figure 8.

Figure 12. Treatment duration and cure probability (TRM logistic model) against drug tolerability interval $τ$ and initial tumour size $N_{0}$ . (A) The optimal drug administration period (x). (B) The cure probability. The white area are regions where patients are untreatable (where no drug is administered). The lifespan of healthy individuals is

L_{e} = 40

years, the lifespan of an untreated patient is

L_{n} = 2

months and parameter values are as Figure 8. The time horizon T is 60 days.

Figure 12. Treatment duration and cure probability (TRM logistic model) against drug tolerability interval $τ$ and initial tumour size $N_{0}$ . (A) The optimal drug administration period (x). (B) The cure probability. The white area are regions where patients are untreatable (where no drug is administered). The lifespan of healthy individuals is

L_{e} = 40

years, the lifespan of an untreated patient is

L_{n} = 2

months and parameter values are as Figure 8. The time horizon T is 60 days.

Table 1. Parameters used in the tumour evolution models (ODE and BP) and in the DLP formulation.

Parameter	Description
$a_{0}$	birth rate of cancer cells in the absence of drugs (BP model)
$b_{0}$	death rate of cancer cells in the absence of drugs (BP model)
$λ_{0}$	growth rate in the absence of drugs (BP model)
g	tumour growth rate (ODE model)
d	inhibitory effect of the drug on cell division (ODE model)
K	carrying capacity (ODE model)
$L_{e}$	expected lifetime of a healthy individual
$L_{n}$	expected lifetime of untreated individual
$γ$	influence of drug toxicity on the patient’s expected lifetime.

Table 2. Parameters used in the SAP formulation.

Parameter	Description
$L_{e}$	expected lifetime of a healthy individual
$L_{n}$	expected lifetime of untreated individual
$α$	Proportionality constant linking drug quantity u to TRM rate $μ$ ( $μ = α u$ ).
$τ$	the drug tolerance interval (the expected period a patient can survive under continuous MTD drug administration, i.e., $τ = \frac{1}{α}$ ).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tzamarias, B.D.E.; Ballesta, A.; Burroughs, N.J. Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy. Mathematics 2025, 13, 3080. https://doi.org/10.3390/math13193080

AMA Style

Tzamarias BDE, Ballesta A, Burroughs NJ. Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy. Mathematics. 2025; 13(19):3080. https://doi.org/10.3390/math13193080

Chicago/Turabian Style

Tzamarias, Byron D. E., Annabelle Ballesta, and Nigel John Burroughs. 2025. "Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy" Mathematics 13, no. 19: 3080. https://doi.org/10.3390/math13193080

APA Style

Tzamarias, B. D. E., Ballesta, A., & Burroughs, N. J. (2025). Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy. Mathematics, 13(19), 3080. https://doi.org/10.3390/math13193080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Control and Tumour Elimination by Maximisation of Patient Life Expectancy

Abstract

1. Introduction

2. The Discounted Life Expectancy Pay-Off (DLP)

3. Computation of the Tumour Elimination Probability

3.1. Probability of Tumour Elimination: Exponential Growth

3.2. Probability of Tumour Elimination: Nonlinear Growth Dynamics

3.3. The Treatment Dependent Elimination Probability (TEP)

4. Optimal Therapy Solutions of the Discounted Expected Lifespan Payoff with Cure or Relapse Outcomes

Optimal Controls for Logistic Growth: Numerical Study

5. Severe Adverse Effects Model: Incorporating Treatment Related Mortality Events

5.1. Logistic Tumour Growth with TRM Treated with a Cell Cycle Targeting Drug

5.2. Intermediate SAE and Iterative Therapy Under LEP Optimisation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. BP and Average Tumour Size Dependent on Die-Out

Appendix B. Dependence of the TCP on the Minimum of N(t)

Appendix C. The Payoff as a Function of the Optimal Drug Administration x

Appendix D. Optimal Solution Types for the Severe Adverse Effect Payoff with Logistic Growth

Appendix E. NLP Problem for the Maximisation of the Severe Adverse Effect Payoff

Appendix F. Influence of the Patient’s Age on Optimal Drug Regimens

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI