Studying Bone Remodelling and Tumour Growth for Therapy Predictive Control

: Bone remodelling consists of cycles of bone resorption and formation executed mainly by osteoclasts and osteoblasts. Healthy bone remodelling is disrupted by diseases such as Multiple Myeloma and bone metastatic diseases. In this paper, a simple mathematical model with differential equations, which takes into account the evolution of osteoclasts, osteoblasts, bone mass and bone metastasis growth, is improved with a pharmacokinetic and pharmacodynamic (PK/PD) scheme of the drugs denosumab, bisphosphonates, proteasome inhibitors and paclitaxel. The major novelty is the inclusion of drug resistance phenomena, which resulted in two variations of the model, corresponding to different paradigms of the origin and development of the tumourous cell resistance condition. These models are then used as basis for an optimization of the drug dose applied, paving the way for personalized medicine. A Nonlinear Model Predictive Control scheme is used, which takes advantage of the convenient properties of a suggested adaptive and democratic variant of Particle Swarm Optimization. Drug prescriptions obtained in this way provide useful insights into dose administration strategies. They also show how results may change depending on which of the two very different paradigms of drug resistance is used to model the behaviour of the tumour.


Introduction
Bone remodelling is a dynamic process that remains active throughout the entire life cycle. This mechanism depends on a complex control system which depends on innumerous hormones, cells, cytokines, among other endless variables. Cancer can be viewed as a loss of tissue homeostasis and it defines the diseases in which abnormal cells divide without control and can invade nearby tissues. Tumour presence provokes severe alterations in the bone remodelling regulation. Multiple myeloma (MM) is the most common cancer to involve bone.
The ongoing constant battle against cancer has been strengthened with groundbreaking discoveries over the years regarding experimental findings and mathematical modelling advances, fundamental for a better understanding of the relationship between experimental and theoretical approaches. Mathematical modelling efforts are crucial to identify the treatment schedules that maximally extend patient survival, perform qualitative and quantitative conclusions regarding certain physiological and biochemical counter-intuitive mechanisms or controlling drug-resistant sub populations within the tumour [1,2]. This paper develops models found in the literature of bone remodelling in the presence of tumours and cancer treatments, in the form of non-linear systems of differential equations, so as to include pharmacokinetic and pharmacodynamic (PK/PD) effects and the resistance to treatment that tumours develop; such models are then used in simulations to find optimal treatment schedules and strategies, using Nonlinear Model Predictive Control.
The structure of the paper is as follows: Section 2 shortly reviews the mechanisms of bone remodelling and tumour growth; Section 3 presents the improved mathematical models; Section 4 presents the mathematical tools to optimise treatment schedules and Section 5 addresses their implementation; Section 6 shows and discusses simulation results, and Section 7 sums up conclusions.

Background-Bone Remodelling and Tumours
Bone remodelling is a coordinated, spatially heterogeneous and adaptive process. The tissue is continuous in a process of removal of old and damaged tissue by osteoclasts (OC) and subsequent reconstruction of the resorptive cavities with new material by osteoblasts (OB). This mechanism maintains the skeleton size, its structure and the mineral homoeostasis [3]. It serves to repair microdefects in the bone matrix and readjust the bone strength to meet new mechanical needs. In a healthy adult bone, the amount of bone that is absorbed is the same as the one formed afterwards, so that the bone mass remains approximately constant. The resorption and formation processes are strongly coupled through anatomic structures termed basic multicellular units (BMU).
OB, the bone forming cells, result from a differentiation pathway under the control of a defined series of transcription factors. It starts with mesenchymal stem cells (MSC), which differentiate into osteoprogenitors, which in turn give rise to preosteoblasts and finally transform into mature osteoblasts [4,5]. OC are multinucleated cells formed by the fusion of mononuclear progenitors of the monocyte/macrophage haematopoietic lineage [6]. Disturbances in osteoclasts and osteoblasts activity and coupling give rise to diseases such as osteoporosis or osteopetrosis.

Bone Remodelling Regulation
Osteoblastogenesis and osteoclastogenesis are tightly linked and regulated by several autocrine and paracrine signalling factors: proteins and hormones secreted by hemopoietic bone marrow cells or bone cells. Bone remodelling regulation is both systemic and local [7].
Osteoclast precursors express Receptor Activator of Nuclear Factor kB (RANK) and Macrophage Colony-stimulating Factor Receptor (c-fms). Osteoblasts produce and release Macrophage colony-stimulating factor (M-CSF) and express the receptor for activation of nuclear factor kappa B (NF-kB) ligand (RANKL), which are key regulators in osteoclasts differentiation and growth. As the protein RANKL and M-CFS bind to the preosteoclastic cells' receptors, RANK and c-fms respectively, signaling pathways are triggered, which promote the survival and differentiation into mature osteoclasts, leading to an increase of the resorption of the bone. Cells of the osteoblast lineage also segregate osteoprotegerin (OPG). This protein acts like a decoy receptor and binds with RANKL, keeping the latter to connect and activate its receptor in the surface of osteoclasts. OPG ihnibits their final differentiation and induces osteoclast apoptosis [6]. The RANKL/OPG ratio determines the degree of osteoclast differentiation, function and apoptosis.
Other factors responsible for bone remodelling regulation are within the following five groups [8]: Systemic hormones, Local cytokines and signals, vitamins and minerals, genetic factors and mechanical loading. Among innumerable agents, the parathyroid hormone (PTH), Insulin and Transforming Growth Factors (IGF and TGF), interleukins (IL) and Tumour Necrosis Factors (TNF) are highlighted for their relevance.

The Process of Bone Remodelling
Activation Phase: The cycle starts with the identification of a triggering signal, which can be a loss in mechanical loading, a disturbance in calcium homeostasis or a change in hormones/citokines concentrations. Osteocytes are cells that are trapped inside the bone matrix. These produce TGF-β, which inhibits osteoclastogenesis. Once osteocyte local apoptosis happen due to mechanical loading, the factor's local concentration decreases, allowing resorption to increase and the remodelling cycle to begin [3]. Another concept suggests that osteoblastic cells receive the osteocytes signalling and activate the BMU [9]. When there is an hormone disturbance, such as a PTH increase, the cycle is also induced, since this hormone binds directly to OB and promotes OC differentiation and activation [4].
Reversal Phase: PTH interaction with the OB leads the latter to segregate monocyte chemoattractant protein-1 (MCP-1), which recruits the OC to the site. In response to the endocrine and mechanical activation signalling, osteoblasts also express matrix metalloproteinases (MMP). The bone surface is degraded by these proteins in order to facilitate osteoclast adhesion to the tissue. Osteoclasts attach to the bone, and after releasing acids, it absorbs th mineralized matrix. The remaining bone is degraded and removed by the enzyme cathepsin K. In this phase, osteoclast apoptosis occurs [10].
Formation Phase: The degradation of the bone matrix unleashes factors that attract the MSC. Termination Phase: At this point, OC suffer apoptosis. Besides apoptosis, OB may also differentiate into bone lining cells or into osteocytes [5].

Tumour and Its Influence on Bone Remodelling
The origin and propagation of cancer appears to be caused by heritable changes in the genetic material of healthy cells-mutations. Cancer is commonly divided into categories according to its origin in the body (primary site). Cells from this primary site may spread to other parts of the host body through the bloodstream or lymphatic system, a process called metastasization. A bone metastasis is a part of the bone containing cancer cells and are the result of complex interactions between tumour cells, bone cells and their microenvironment [11]. They are responsible for deregulating the normal functioning of bone remodelling and are commonly classified in two extreme phenotypes according to the distortion of the coupling: osteoblastic, when bone formation is enhanced and osteolytic, when resorption is promoted [12]. This work is solely focused on the latter.
Tumour cells release several factors such as parathyroid hormone-related protein (PTHrP) and interleukins IL-6, IL-8 and IL-11 which promote osteolysis by stimulating osteoclast activity. A vicious cycle is established: factors that are trapped in the bone matrix and expressed during resorption, such as TGF-β, vascular endothelial growth factor (VEGF) and IGFs stimulate tumour cells' survival and proliferation and subsequently PTHrP production. [11,[13][14][15].

Pharmacodynamics and Pharmacokinetics
The proposed bone remodelling and tumour growth model is based on Ayati's [22] work. From that formulation, the therapy effects of four drugs were added: Denosumab (T 1 ), bisphosphonates or BP (T 2 ), Paclitaxel (T 3 ) and proteasome inhibitors or PI (T 4 ). The Pharmacokinectics, or PK (this designates the study of the time evolution of drug absorption, distribution, metabolism and excretion [23]), of each drug j is modelled as the two-compartmental model (1).
The variable C g (t) is the concentration of the drug in a peripheral compartment, C p (t) is the effective concentration in plasma, k a is the absorption rate and k e , the elimination rate [24].
The Pharmacodynamics, or PD (this refers to the relationship between drug concentration at the site of action and the resulting effect [25]), may be modelled with a Hill function (Equation (2)), whose output d j (t) is the drug effect in the system and lies within the interval [0, 1]. The variable C 50 (t) is the concentration that achieves 50% of the drug's maximum effect.

Finding the Bone Mass
The model equations relative to the OC number C(t), OB number B(t), bone mass z(t) in percentage of its steady-state value, and tumour burden T(t) in percentage of bone mass at time t [days] are represented by (3). The tumour growth is described with a gompertzian curve. Behind this equation lies the idea that the per capita growth of the population decreases exponentially with time. Its sigmoidal shape is qualitatively conceivable; the growth rate derivative of small sized tumours should be increasing, since they easily adapt to the environment obstacles. As the tumour increases in size, it the proliferation becomes more difficult, considering that the host physiology is more degraded, and the resources start to lack.
The parameters α i and β i are activities of cell production and removal, respectively. The index 1 refers to the OC population, and the index 2 to the OB. The autocrine and paracrine effects between OC and OB are not treated separately, their contributions are summed and expressed as the parameters g ij , the net effectiveness of osteoclast or osteoblast-derived autocrine or paracrine factors. These can be positive (stimulatory) or negative (inhibitory).
The bone mass variation is attributed to the cell proliferation above the respective non-trivial steady-state levels,C (OC) andB (OB). The cells under this value are considered to be unable to resorpt or form bone, but they still participate in autocrine and paracrine signalling. The rates of bone formation and resorption are proportional to the number of osteoclasts and osteoblasts that exceed steady-state values, and k i represents the normalized activity of bone resorption (i = 1) and formation (i = 2). The parameter L T is an arbitrary value for the maximum size of T(t) and γ T is the respective growth constant. The values r translate the effect of the metastasis size in the autocrine and paracrine factors. The term h(t), given by (3j), is introduced in this work and represents the influence of an excessive osteoclastic activity in tumour proliferation. The parameter K T measures the effect of this influence.  K d j is the maximum effect of the drug T j . The system (3) is a general representation that comprises the four drugs. In reality, these drugs are grouped in three combined therapies: T 1 + T 3 , T 2 + T 3 and T 4 + T 3 . This means that there are only two inputs: the anti-cancer drug effect d 3 (t) and one of the three bone therapies effects, d 1 (t), d 2 (t) or d 4 (t). The PK/PD response is illustrated in Figure 2. The parameters of this simulation can be found in Table 1, including the periodicity of administration τ. The PK/PD initial conditions for a single administration are obtained with C 0 for the remaining drugs. The parameters D 0 , V d and F correspond to the initial dose, volume distribution and bioavailability, respectively.
The PK/PD models and respective parameters regarding T 1 , T 2 and T 3 were suggested by Coelho et al. [4]. The parameters regarding the PI model were estimated from a non-compartmental analysis in plasma of patients with advanced solid tumours, specifically the half life t 1/2 and estimated C 0 p [26]. The remaining parameters of the model are fixed as:

Inclusion of Drug Resistance
Model (3) diverges into two variations that approach the resistance to paclitaxel in two different manners, the models M DR1 and M DR2 , each of them corresponding to a possible model of drug resistance found in the literature [24].
Model M DR1 considers that the resistance accumulation is caused by C p levels of the drug below a certain threshold C th p [27]. The C 50 3 is affected according to where parameter K r translates the capacity of the tumour cells to develop resistance and C base 50 is a constant which represents the initial value of C 50 3 .
Model M DR2 is based on the Random Mutation Model (RMM) [28], a Darwinian theory that proposes the existence of two proliferative tumour cell populations: S(t) is composed completely sensitive cells, and R(t) by completely resistant ones. The combination between the RMM and the proposed model (3) results in the following tumour growth description: where λ 1 and λ 2 are the mutation and back-mutation rates between the S and R [29].

Nonlinear Model Predictive Control
Model predictive control (MPC) refers to a class of control methods that make use of an an explicit process model to predict the future response of a system and obtain the control sequence over a certain horizon that minimizes a cost function. MPC performance is therefore highly dependent on the model performance. This also called receding horizon control does not allow the current time slot to be optimized, while keeping future time slots in account, which represents a major advantage; see Figure 3. The input is optimized for a finite prediction horizon N p and subsequently the first entries of the optimized input sequence (control horizon N c ) are fed back. It represents an advantage over classical control since it has the ability to explicitly account for systems constraints, the constrained nonlinear optimization problem is easy to formulate, multivariable processes can be handled in a straightforward manner, and reference tracking can be improved if the references are known in advance. Besides, it easily handles nonlinear and time-varying plant dynamics, since the controller is a function of the system and can be modified in real time [30,31]. The proposed implementation of NMPC in this work will count on metaheuristics, specifically Particle Swarm Optimization, to solve the nonlinear problem at each step. Combining metaheuristics with MPC brings flexibility to design any type of cost function.

Proposed PSO Algorithm
Particle Swarm Optimization (PSO) is a collective, anarchic, nature-inspired population-based search algorithm [32,33]. It is inspired in the social behaviour of a bird flock. PSO algorithms are a common choice to solve the optimization problem involved in model predictive control schemes [34,35].
The swarm is composed by S particles wandering in a D-dimensional space. The position coordinates of each particle i are equivalent to a candidate solution. The particles' position and velocity are updated taking into account advantageous positions of the surrounding partners. This position adjustment depends on the difference between the particles' current position x i and two others: p i (the best position visited by particle i, and p g (the best position visited by any particle of the swarm). At each iteration k, the particles' position and velocity vector v i are updated according to The acceleration/confidence parameters c 1 , c 2 and c 3 are the cognitive, social and democratic coefficient, respectively. The particle's inertia is measured by the inertia weight w and rand is a random number in the interval [0,1]. The last term is not present in the traditional PSO. This democratic approach brings to the velocity update the opinion of all eligible particles of the swarm [36]. The vector d e contains this swarm contribution and is obtained with where d ij is the jth entry of vector d e for each particle i, c 3 is the confidence coefficient which controls the weight of the the democratic quantities, and cost(x) is the cost function chosen for the particular problem evaluated at x. This vector represents the democratic effect of the other particles in the movement of particle i. The weight of the kth particle is represented by Q ik , which depends on the eligibility parameter E ik . The best and worst particles of the swarm at each iteration are denoted x best and x worst , respectively. The adaptive profile of the proposed algorithm, AD-PSO is translated in the inertia weight evolution with the iterations [37]. The inertia weight value regarding the jth variable of the particle i is updated with the Equations (9) and (10).
Value w 0 is a constant that defines the initial inertia weight. Parameter is a non-critical small and positive value that ensures a proper variation of the inertia weight. Value Λ is obtained with (10), where is a non-critical small and positive value that ensures a proper variation of the inertia weight.
The values δ i measure the success of particle i in the following manner: The TVAC algorithm (Time Varying Acceleration Coefficients) [38] dictates the dynamic behaviour of both c 1 and c 2 according to Equations (12).
The values c i 1 and c f 1 are the initial and final values of the coefficient c 1 while c i2 and c f 2 represent the initial and final values of c 2 . These linearly increasing and decreasing behaviours are defined until the maximum number of iterations n iter .
The minimum number of n iter is set to n iter = 100, however the algorithm stops when a state of convergence is achieved. The stopping criterion uses a counter θ which records the number of consecutive iterations with no improvement, after k > n iter , according to (13). The algorithm stops when θ reaches a maximum value θ max .
The dynamic parameters are maintained constant at c 1 (k) = 1.5 and c 2 (k) = 2.5 for k > n iter . The PSO parameters were fixed for this problem as: S = 50, w 0 = 0.9, = 0.005, The algorithm is depicted in Figure 4.

Implementation
In this section, the NMPC-PSO scheme described in Section 4 is implemented with the objective of optimizing the prescription doses of the proposed therapies, when the drugs are administered with the fixed periodicity τ.
Only the therapies T 1 + T 3 and T 4 + T 3 are considered for this optimization. The BP therapy, although it results in a qualitatively viable therapy model, is not suitable for optimization. The rise of OC apoptosis due to BP decreases, although very slightly, the lower bound of the OC time response. The tumour T(t) causes the opposite reaction: an increase of the mean value of C(t), as well as its lower bound. When the tumour is proliferating, or has a substantial size, this lower bound increase cancels out the decrease caused by BP. When the tumour starts to be extinguished, the anti-resorptive effect pushes the OC number to fall below zero. Although the negative values of OC are smaller orders than C, it is enough to severely interfere with the dynamics, due to the appearance of complex numbers.
The standard regimen is defined as the prescription schedule which administrates constant values of C 0 p and C 0 g . The decision variables are C 0 p , for the initial concentration of paclitaxel and PI and C 0 g for the initial concentration for denosumab. These values for the initial concentration are fixed at C 0 g1 = 23.62, C 0 p4 = 0.0367 and C 0 p3 = 1.0983 (denosumab, PI and paclitaxel, respectively). The lower bound of the concentration is equal to zero for all drugs and the upper bound is considered to be three times the standard regimen doses, and therefore C 0 p1 max = 70.86, C 0 p4 max = 0.11 and C 0 p3 max = 3.29. The maximum velocity of the particles, v max is calculated offline as v max = 0.2(x max − x min ) and the minimum as v min = −v max . The time of the diagnosis and therefore the beginning of the optimization is considered to be t start = 800 days. The optimization is single-objective and the goal is to minimize the tumour size, the drug dosage and approximate the bone mass to a healthy level as much as possible. The objective function is defined as where C p 3 and C p br are the plasma concentration evolutions of paclitaxel and one of the two bone remodelling pharmaceuticals, C p 1 or C p 4 . The quantities L 1 , L 2 , L 3 and L 4 are maximum values of z, T, C p br and C p3 , respectively and can be found in Table 2, as well as the weights w 1 , w 2 , w 3 and w 4 . The functional operator D −1 is a Riemann integral, and D α is a fractional derivative, defined here (according to Grünwald and Letnikoff) as [39] c D α t f (t) = lim where ( a b ), the combinations of a things, b at a time can be obtained with [40] Γ is the gamma function, an extension of the factorial function. A fractional derivative was used here because we are interested in attributing a weight to tumour level that increases with time; therefore, the functional order is set to n = −0.8, instead of order n = −1, which corresponds to a classical integrator and weights all past moments equally [40].

Results
Figures 5 and 6 plot the system behaviour of models M DR1 and M DR2 , respectively, when each of the therapies are administrated. At first glance, both models under the therapies T 1 + T 3 an T 4 + T 3 appear to have a similar qualitative behaviour. Comparing the bone resorption therapies (PI or denosumab), there are clear differences specially when it comes to the OC and OB oscillatory behaviour. The different type of action of both drugs justifies these discrepancies. One should not make comparison assumptions of these pharmaceuticals solely based on these simulations: the fact that z(t) stabilization is more effective under a certain model may be due to the chosen system parameter values.
Regarding the different drug resistance models, the tumour growth behaviour is different depending on whether the phenomenon is modelled under the random mutation or the varying C 50 model. Both models predict a decrease of T immediately after the therapy starts, more or less symmetric to the precedent increase for a few weeks. The tumour achieves then a low value that never reaches 0. In model M DR2 , the drug resistance effects arise when the resistant population uncontrollable proliferation continues with the same strength as the initial cancer, after an apparent remission. As soon as the gradient turns positive, it is certain that the metastasis will grow abruptly to high values until death. On the other hand, model M DR1 allows a smoother accumulation of resistance. Even if the tumour does not reach values as low as with the last model, the cancer is maintained at a more constant value after the point when drug resistance is evident.

Model M DR1 -Therapy T 1 + T 3
While the control horizon N c was maintained constant, the prediction horizon N p was varied between 10, 20, 30 and 40 weeks. The N c is fixed to 4 weeks. The global best position p g is initialized with the dose values of standard regimen, a strategy that from now on will be termed SIC (standard initial condition). Figure 7 is a comparison of the system reaction to the best obtained prescription when N p = 10, N p = 20, N p = 30, N p = 40 and when the prescription is standard. The cost decreases with the increase of N p , when handling the model M DR1 . Nevertheless, all of the four prescriptions are more successful than the standard regimen. Figure 8 contains the resultant prescription of denosumab and paclitaxel with the model M DR1 , when N p = 10 and N p = 40. The C 0 p and C 0 g mean values tend to coincide with a value higher than the respective standard dose, yet lower than the maximum values defined for this problem. Note that when N p = 10, MPC obtains several null entries (22 out of 45 administrations), suggesting a higher τ for the denosumab. The increase in N p tends to produce a drug concentration distribution less variant.

Model M DR2 -Therapy T 1 + T 3
The two populations model, M DR2 , was subject to the same sensitivity analysis. The system dynamics when N p is fixed to different values is compared in Figure 9. The almost indistinguishable curves translate the insensitivity of the model to N p . This model allows a decrease of the tumour size to almost null values; however, when the resistance effects arise, the regrowth is extremely aggressive. The impossibility of tumour annihilation is associated with a resistant and proliferative population. Therefore, the resistance can never be defeated with a unique anti-cancer drug.  The denosumab and paclitaxel prescription results are shown in Figure 10. From these plots it is verified that the paclitaxel is administrated even after the sensitive population is supposedly extinguished. One might suspect that this administration would be interrupted at some point, since it has no effect on the resistant population. In fact, the doses diminish over time, but they never reach 0. This is due to the nature of the Gompertz equation: the sensitive population never actually reaches full extinction, just residual values. Besides, the S population keeps acquiring part of the resistance cells that suffer back-mutation. Therefore, there is a necessity of a paclitaxel continuity to keep the S population from regrowth. The outcome from the optimization when N p = 10 suggests a higher τ 1 of denosumab, as it happened with M DR1 .
Although the algorithm accounts for one single cost, the proportions of each term of the objective function throughout the 45 months of treatment is displayed in Figure 11 (Model M DR1 ) and Figure 12 (Model M DR2 ). The left plots correspond to a patient who received the maximum dose allowed for this problem, while the right plots correspond to prescription resultant from optimization. As expected, the administration of the maximum allowable doses retrieve slightly better values for the tumour and bone mass associated cost; however, the inherent high administration cost does not allow this prescription to be a good option. Nevertheless, the optimization prescription outputs extremely close tumour and bone mass costs (in fact, almost indistinguishable) to those obtained with the most aggressive therapy. This represents a major advantage, since the patient is safeguarded from a therapy with higher drug exposure, but still obtaining very similar results.

Therapy T 4 + T 3
When handling the therapy T 4 + T 3 , the PSO deals with problems with higher dimensions, because T 4 has a more frequent administration than T 1 . To avoid excessive computational effort, the optimization of this therapy was performed only once for each model, with N p = 40 weeks. Figure 13 presents the best obtained prescriptions of PI and paclitaxel, for both models. As expected, the C 0 p evolutions follow a similar pattern to that of the last therapy. When handling model M DR1 , a mobile mean value of both drugs is maintained almost constant, as well as the standard deviation.
When facing a two populations model, a decreasing tendency is evident for both dose value and standard deviation. In both therapies, C 0 p 3 converge to similar values. The system dynamics when the optimized regimen is applied is shown in Figure 14 for both models. The amplitude of oscillation of OC and OB is significantly higher when PI is used instead of denosumab, as the respective period. The cost proportions are once again represented, in Figures 15 and 16, for the models M DR1 and M DR2 , respectively. As before, the left plots refer to a patient who took maximum doses of both drugs and the right plots to a patient whose regimen was optimized. Figure 15 shows a similar increasing evolution to Figure 11, although the costs are considerably higher and there is a bigger discrepancy between the proportion of the weight related to the bone mass to the rest of the terms of J. Figures 14 and 16 reinforce the impossibility of avoiding a U-shaped tumour curve when dealing with M DR2 . The oscillating appearance of the curves of Figure 16 is due to the extremely high period and amplitudes that PIs provoke in the bone remodelling process.

Sensitivity to the Initial Global Best Position
The dependency of the optimization regarding the initial global best position is here analysed. The strategy SIC is replaced by the strategy LIC (low initial condition), which initializes p g with concentration values that are ten times smaller than the standard regimen's. This section compares both strategies.   The prescription doses with LIC ( Figure 17) appear not to have changed significantly, when compared with SIC ( Figure 8). The only significant difference between these results is the paclitaxel administration when N p = 10. The considerably lower doses resulted in poorer performance treating the tumour burden when compared to a standard initial condition, as one can verify in Figure 18a.
When handling M DR2 , the resultant prescriptions differences are much more pronounced ( Figure 19). All of the dose values decrease with time, and this variation is significantly more accentuated with a low initial global best position, of both denosumab and paclitaxel. In fact, approximately in the last two thirds of the treatment period, almost all of the doses are far below the standard values. The input sequences obtained with SIC and LIC, although very disparate, produced quite similar results on the system behaviour ( Figure 18b). The PSO fitness values evaluated when LIC is used are about 10% lower than those obtained with SIC when N p = 40 but are almost equal when N p = 10. The tumour decrease with the LIC prescription when N p = 10 is not so fast, however the regrowth due to resistance is slightly postponed. Although this result translated in a higher cost associated to the tumour, the interpretation would be different if one attributed more importance to the late regrowth. In fact, this particular result supports the paradigm that says that the resistant population is delayed if the sensitive is not immediately eradicated.
One concludes that the optimization resultant input sequence with the model M DR2 is very sensitive to the initial global best position, contrarily to M DR1 . It seems more favourable to start the global best to lower values, since the significant lower prescription doses are still enough to produce the same results as the SIC strategy. This analysis was not performed with the therapy T 4 + T 3 due to the excessive computational effort, but it is fair to generalize this conclusion.

Conclusions
The proposed model contains two major novelties: (1) the positive influence of the osteoclasts activity in the tumour proliferation, in order to recreate a vicious cycle between the two mechanisms, and (2) two drug resistance mechanisms regarding the immunity of tumour cells to paclitaxel, according to two different paradigms on the topic. The PK/PD of four drugs (Denosumab, BP, paclitaxel and PI) were included to simulate more accurately the system reaction to the therapy. Several factors regarding the OC and OB coupling and survival are yet to be considered and included in a more meticulous description of the system.
An NMPC with a PSO optimizer was the method to handle the problem, due to the severe nonlinearities of the system. Combining metaheuristics with MPC provided flexibility to design any type of cost function, ability to straightforwardly take the constraints into account and capability of solving the nonlinear problem. The developed adaptive algorithm (AD-PSO) is a novelty, and resulted from the combination of several existent variations of the PSO.
The drug dose optimization was performed on both resistance models and both proposed therapies. The healthy status was not achieved with any of the cases, due to the existence of drug resistance to the anti-cancer therapy. It is assumed that the M DR1 would require an impractically high and frequent dose to extinguish the tumour completely. The random mutation model was destined for therapy failure, due to the incontrovertible existence of a proliferative population, immune to paclitaxel. Once the sensitive cells number reduced to an apparent remission state, the administration of the anti-cancer therapy is decreased, while the resistant proliferates uncontrollably. The administration of a unique drug is insufficient to defeat the cancer burden.
When handling a two-populations model, it is desirable to initialize the PSO with low doses. The resultant input sequence tends to decrease to significantly lower values, although enough to keep the sensitive population from regrowth. When the paclitaxel dosage is too low, the performance of cancer treatment appears to be poorer based on cost. However, the tumour curve may be interpreted as favourable, if one attributes more importance to the fact that the regrowth is postponed. It was also verified that the best obtained doses for both models safeguarded the patients from a high exposure to drugs, while outputting extremely close results to a regimen of intense maximal administration.
It is important to remark that the models are rough approximations of the reality. Several principles and variables are not taken into account, relatively to the type of patient and to the mechanisms involved. The difficulty of harvesting data for a wide range of conditions, specially regarding the origin of cancer (pre-diagnosis) and progression of untreated cancer (post-diagnosis), represents a barrier for the mathematical formulation and validation. Due to this lack of experimental data, the model was constructed and tuned based solely on the comparison between the qualitative behaviour and theoretical principles in the literature.

Acknowledgments:
The authors would like to thank the collaboration of Hospital Santa Maria and IMM, and in particular Doctor Irina Alho for her help with the details of the biological processes addressed.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: