Optimal Fractionation Scheduling for Radiotherapy Treatments with Reinforcement Learning, Tumor Growth Modeling and Outcome Modeling
Abstract
:1. Introduction
2. Related Works
2.1. Tumor Growth Model
- G: A grid of cells, shaping the model’s spatial structure.
- E: The various phases or cycles each cell might undergo, such as growth, division, or death.
- U: The neighboring cells, representing the local environment for interaction.
- f: A rule determining each cell’s behavior and state transition based on its current state and the states of its neighbors.
2.2. Lyman–Kutcher–Burman Model
3. Materials
3.1. In Silico Tumor Growth Model
- The types of cells within this pixel. Healthy and cancer cells can co-exist on the same pixel. In such cases, the pixel is displayed in red to denote the presence of cancer cells.
- The cell density on the grid.
- The glucose concentration.
- The oxygen concentration.
- Gap 1 (G1): Cell growth and preparation for DNA replication (11 h).
- Synthesis (S): DNA replication (8 h).
- Gap 2 (G2): Cell growth and preparation for mitosis (4 h).
- Mitosis (M): Formation of two daughter cells (1 h).
- An oxygen consumption efficiency, which reflects the cell’s oxygen usage during DNA replication. For both healthy and cancer cells, oxygen consumption is modeled as a normally distributed random variable, [mg/cell/hour]. These values are based on the measurements reported by O’Neil [21].
- A glucose absorption efficiency, representing the cell’s glucose uptake rate, which is vital for DNA replication. This parameter is sampled from
- -
- [mg/cell/hour] for healthy cells.
- -
- [mg/cell/hour] for cancer cells.
- Treatment is successful if all cancer cells die due to lack of nutrients or radiation. Even one remaining cancer cell can lead to a new tumor.
- Treatment fails if radiation therapy results in a growing tumor. If the number of healthy cells falls below 10, the simulation ends, indicating failure.
- Treatment duration is limited to 1200 h (50 days). Exceeding this time limit results in the termination of the simulation.
3.2. LQ Model Parameters
3.3. Reinforcement Learning
- States , with S representing the state space.
- Actions , with A as the action space.
- Rewards constrained within [0, ], where is normalized to 1.
- A transition function denoting the probability of reaching state s’ from state s after action a.
- A discount factor , ranging between [0,1).
- represents the current estimate of the Q-value for the current state–action pair.
- is the immediate reward after taking action in state .
- is the estimated Q-value for the next state–action pair.
- is the learning rate, which dictates the step size in updating the Q-value.
- is the discount factor, which discounts future rewards.
3.4. NTCP Empirical Model
- The process starts by obtaining the DVH data for the OAR. The DVH represents the distribution of dose within the tissue volume. It takes the form of a histogram representing the volume fraction (ordinate) irradiated by a given dose (abscissa).
- The generalized equivalent uniform dose (gEUD) is calculated from the DVH data. The gEUD is a single dose value that accounts for the variable dose distribution across the volume of a tissue or organ. In other words, it is the dose that, if given uniformly, would lead to the same radiobiological effect.
- Once the gEUD is calculated, it is plugged into the LKB model to obtain the NTCP.
3.5. Dose–Volume Histogram
4. Methods
4.1. Integrating Reinforcement Learning with the Tumor Growth Model
4.2. Radiotherapy Treatment Assessment and Key Performance Indicators
- Success Rate (SR) [%]: SR quantifies the treatment’s overall success, calculated as the percentage of simulations that resulted in complete tumor eradication relative to the total number of simulations conducted.
- NTCP [%]: NTCP represents the estimated likelihood of radiation-induced complications in healthy tissue, expressed as a percentage. This metric helps to predict potential adverse effects in specific organs.
- Dose [Gy]: This refers to the mean total radiation dose administered across all simulations, providing an average measure of the radiation intensity delivered in Grays (Gy).
- Fractions [-]: The mean number of fractions (treatment sessions) per simulation, offering insight into the fractionation strategy used during treatment.
- Duration [h]: The mean treatment duration across all simulations, expressed in hours, providing an assessment of the overall length of the treatment regimen.
4.3. Robustness Against Cellular Model Errors
- Nutrient consumption rate of cancer cells: We examine how changes in the nutrient consumption rates of cancer cells affect treatment outcomes, assessing whether these variations lead to significant deviations in the results.
- LQ model parameters: We investigate the impact of variations in the and parameters on the model’s performance. These parameters, which are crucial for determining the effectiveness of the radiation dose, are selected based on the ranges derived from the literature review. These ranges correspond to
- -
- Rectum: and .
- -
- Head and neck: and .
- -
- Lung: and .
5. Results
5.1. RL-Based Treatments for Different Cancer Locations
5.2. Generalization of Trained Agents or Robustness of Trained Agents
5.3. Retraining
6. Discussion
6.1. General Agent Behavior
6.2. Overall Results
6.3. Nutrient Consumption Variations
6.4. Limitations
6.5. Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Aitken, K.; Mukherjee, S. When less is more: The rising tide of hypofractionation. Clin. Oncol. 2022, 34, 277–279. [Google Scholar] [CrossRef] [PubMed]
- Kim, N.; Kim, Y.B. Journey to hypofractionation in radiotherapy for breast cancer: Critical reviews for recent updates. J. Radiat. Oncol. 2022, 40, 216–224. [Google Scholar] [CrossRef] [PubMed]
- Shen, J.; Yang, D.; Chen, M.; Jiang, L.; Dong, X.; Li, D.; Yu, R.; Yu, H.; Shi, A. Hypofractionated volumetric-modulated arc radiotherapy for patients with non-small-cell lung cancer not suitable for surgery or conventional chemoradiotherapy or SBRT. Front. Oncol. 2021, 16, 644852. [Google Scholar] [CrossRef] [PubMed]
- Moreau, G.; François-Lavet, V.; Desbordes, P.; Macq, B. Reinforcement learning for radiotherapy dose fractioning automation. Biomedicines 2021, 19, 214. [Google Scholar] [CrossRef]
- Unkelbach, J.; Papp, D. The emergence of nonuniform spatiotemporal fractionation schemes within the standard BED model. Med. Phys. 2015, 42, 2234–2241. [Google Scholar] [CrossRef]
- Rigaud, B.; Simon, A.; Gobeli, M.; Leseur, J.; Duverge, L.; Williaume, D.; Castelli, J.; Lafond, C.; Acosta, O.; Haigron, P.; et al. Statistical shape model to generate a planning library for cervical adaptive radiotherapy. IEEE Trans. Med. Imaging 2019, 38, 406–416. [Google Scholar] [CrossRef]
- Belfatto, A.; Riboldi, M.; Ciardo, D.; Cattani, F.; Cecconi, A.; Lazzari, R.; Jereczek-Fossa, B.A.; Orecchia, R.; Baroni, G. Modeling the interplay between tumor volume regression and oxygenation in uterine cervical cancer during radiotherapy treatment. IEEE J. Biomed. Health Inform. 2016, 20, 596–605. [Google Scholar] [CrossRef]
- Kunz, L.V.; Bosque, J.J.; Nikmaneshi, M.; Chamseddine, I.; Munn, L.L.; Schuemann, J.; Paganetti, H.; Bertolet, A. AMBER: A Modular Model for Tumor Growth, Vasculature and Radiation Response. Bull. Math. Biol. 2024, 86, 139. [Google Scholar] [CrossRef]
- Kolokotroni, E.; Abler, D.; Ghosh, A.; Tzamali, E.; Grogan, J.; Georgiadi, E.; Büchler, P.; Radhakrishnan, R.; Byrne, H.; Sakkalis, V.; et al. A Multidisciplinary Hyper-Modeling Scheme in Personalized In Silico Oncology: Coupling Cell Kinetics with Metabolism, Signaling Networks, and Biomechanics as Plug-In Component Models of a Cancer Digital Twin. J. Pers. Med. 2024, 14, 475. [Google Scholar] [CrossRef]
- Zheng, D.; Preuss, K.; Milano, M.T.; He, X.; Gou, L.; Shi, Y.; Marples, B.; Wan, R.; Yu, H.; Du, H.; et al. Mathematical modeling in radiotherapy for cancer: A comprehensive narrative review. Radiat. Oncol. 2025, 20, 49. [Google Scholar] [CrossRef]
- Liu, R.; Swat, M.H.; Glazier, J.A.; Lei, Y.; Zhou, S.; Higley, K.A. Developing an Agent-Based Mathematical Model for Simulating Post-Irradiation Cellular Response: A Crucial Component of a Digital Twin Framework for Personalized Radiation Treatment. arXiv 2025, arXiv:2501.11875. [Google Scholar]
- Shen, C.; Nguyen, D.; Chen, L.; Gonzalez, Y.; McBeth, R.; Qin, N.; Jiang, S.B.; Jia, X. Operating a treatment planning system using a deep-reinforcement learning-based virtual treatment planner for prostate cancer intensity-modulated radiation therapy treatment planning. Med. Phys. 2020, 47, 2329–2336. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Bai, X.; Wang, Y.; Lu, Y.; Wang, B. An integrated solution of deep reinforcement learning for automatic IMRT treatment planning in non-small-cell lung cancer. Front. Oncol. 2023, 13, 1124458. [Google Scholar] [CrossRef] [PubMed]
- Saba, E.; Lim, G.J. A reinforcement learning approach for finding optimal policy of adaptive radiation therapy considering uncertain tumor biological response. Artif. Intell. Med. 2021, 121, 102193. [Google Scholar] [CrossRef]
- Lyman, J.T. Complication probability as assessed from dose-volume histograms. Radiat. Res. 1985, 8, 13–19. [Google Scholar] [CrossRef]
- Wang, Z.; Maini, P.K. Editorial Special Section on Multiscale Cancer Modeling. IEEE Trans. Biomed. Eng. 2017, 64, 501–503. [Google Scholar] [CrossRef]
- Belfatto, A.; Riboldi, M.; Ciardo, D.; Cecconi, A.; Lazzari, R.; Jereczek-Fossa, B.A.; Orecchia, R.; Baroni, G.; Cerveri, P. Adaptive mathematical model of tumor response to radiotherapy based on CBCT data. IEEE J. Biomed. Health Inform. 2016, 20, 802–809. [Google Scholar] [CrossRef]
- Le, M.; Delingette, H.; Kalpathy-Cramer, J.; Gerstner, E.R.; Batchelor, T.; Unkelbach, J.; Ayache, N. Personalized radiotherapy planning based on a computational tumor growth model. IEEE Trans. Med. Imaging. 2017, 36, 815–825. [Google Scholar] [CrossRef]
- Magni, P.; Germani, M.; De Nicolao, G.; Bianchini, G.; Simeoni, M.; Poggesi, I.; Rocchetti, M. A Minimal Model of Tumor Growth Inhibition. IEEE Trans. Biomed. Eng. 2008, 55, 2683–2690. [Google Scholar] [CrossRef]
- Lipkova, J.; Angelikopoulos, P.; Wu, S.; Alberts, E.; Wiestler, B.; Diehl, C.; Preibisch, C.; Pyka, T.; Combs, S.E.; Hadjidoukas, P.; et al. Personalized radiotherapy design for glioblastoma: Integrating mathematical tumor models, multimodal scans, and bayesian inference. IEEE Trans. Med Imaging 2019, 38, 1875–1884. [Google Scholar] [CrossRef]
- O’Neil, N. An Agent-Based Model of Tumor Growth and Response to Radiotherapy. Master’s Thesis, Virginia Commonwealth University, Richmond, VA, USA, 2012. [Google Scholar] [CrossRef]
- Jalalimanesh, A.; Haghighi, H.S.; Ahmadi, A.; Soltani, M. Simulation-based optimization of radiotherapy: Agent-based modeling and reinforcement learning. Math. Comput. Simul. 2017, 133, 235–248. [Google Scholar] [CrossRef]
- Emami, B.; Lyman, J.; Brown, A.; Coia, L.; Goitein, M.; Munzenrider, J.E.; Shank, B.; Solin, L.J.; Wesson, M. Tolerance of normal tissue to therapeutic irradiation. Int. J. Radiat. Oncol. Biol. Phys. 1991, 21, 109–122. [Google Scholar] [CrossRef] [PubMed]
- Kehwar, T.S. Analytical approach to estimate normal tissue complication probability using best fit of normal tissue tolerance doses into the NTCP equation of the linear quadratic model. J. Cancer Res. Ther. 2005, 1, 168–179. [Google Scholar] [CrossRef] [PubMed]
- Dennstädt, F.; Medová, M.; Putora, P.M.; Glatzer, M. Parameters of the Lyman Model for calculation of normal-tissue complication probability: A systematic literature review. Int. J. Radiat. Oncol. Biol. Phys. 2023, 115, 696–706. [Google Scholar] [CrossRef]
- van Leeuwen, C.M.; Oei, A.L.; Crezee, J.; Bel, A.; Franken, N.A.P.; Stalpers, L.J.A.; Kok, H.P. The alfa and beta of tumours: A review of parameters of the linear-quadratic model, derived from clinical radiotherapy studies. Radiat. Oncol. 2018, 13, 96. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning, Second Edition: An Introduction; Adaptive Computation and Machine Learning Series; MIT Press: Cambrigde, UK, 2018. [Google Scholar]
- Dausort, M.; Delinte, N.; Dessain, Q.; Vanden Bulcke, C.; Macq, B. A multi-compartment fingerprinting model for non-invasive tumor cell characterization via diffusion MRI. In Proceedings of the 2023 ISMRM & ISMRT Annual Meeting & Exhibition, Toronto, ON, Canada, 3–8 June 2023. [Google Scholar]
- Huang, H.; Huang, F.; Liang, X.; Fu, Y.; Cheng, Z.; Huang, Y.; Chen, Z.; Duan, Y.; Chen, Y. Afatinib Reverses EMT via Inhibiting CD44-Stat3 Axis to Promote Radiosensitivity in Nasopharyngeal Carcinoma. Pharmaceuticals 2023, 16, 37. [Google Scholar] [CrossRef]
- Chen, Y.; Deng, Y.; Li, Y.; Qin, Y.; Zhou, Z.; Yang, H.; Sun, Y. Oxygen-Independent Radiodynamic Therapy: Radiation-Boosted Chemodynamics for Reprogramming the Tumor Immune Environment and Enhancing Antitumor Immune Response. ACS Appl. Mater. Interfaces 2024, 16, 21546–21556. [Google Scholar] [CrossRef]
Parameters | Theoretical Values | Units |
---|---|---|
Starting number of healthy cells | 1000 | - |
Starting number of cancer cells | 1 | - |
Starting number of nutrient sources | 100 | - |
Starting glucose level | mg | |
Starting oxygen level | mL | |
Average glucose absorption (healthy) | mg/cell/h | |
Average glucose absorption (cancer) | mg/cell/h | |
Average oxygen consumption (healthy) | mL/cell/h | |
Average oxygen consumption (cancer) | mL/cell/h | |
Critical oxygen level | mL/cell | |
Critical glucose level | mg/cell | |
Quiescent oxygen level | mL/cell | |
Quiescent glucose level | mg/cell |
Cancer Location | ||||
---|---|---|---|---|
(Gy) | (Gy) | (Gy) | (Gy) | |
Rectum | 0.315 | 0.0662 | 0.0484 | 0.0124 |
Head and neck | 0.330 | 0.029 | 0.0341 | 0.0114 |
Lung | 0.325 | 0.0325 | 0.0637 | 0.0168 |
Cancer Location | Endpoint | m | n | |
---|---|---|---|---|
Rectum | Rectal bleeding | 80.10 | 0.150 | 0.15 |
Head and neck | Xerostomia | 40.28 | 0.408 | 0.01 |
Lung | Symptomatic pneumonitis | 29.88 | 0.400 | 0.15 |
SR | NTCP | NTCP CI | Dose | Fraction | Duration | ||
---|---|---|---|---|---|---|---|
Method | (%) | (%) | (%) | (Gy) | (-) | (h) | |
Rectum | Baseline | 100.0 | 10.74 | [7.55,14.1] | 50.99 | 28.33 | 679.92 |
RL-based | 100.0 | 0.006 | [0.00,0.02] | 25.56 | 7.94 | 190.56 | |
Head and neck | Baseline | 99.0 | 80.42 | [78.3,82.7] | 59.42 | 29.71 | 713.04 |
RL-based | 100.0 | 31.29 | [27.1,31.2] | 34.04 | 10.14 | 243.36 | |
Lung | Baseline | 100.0 | 99.44 | [99.2,99.6] | 57.02 | 28.51 | 684.24 |
RL-based | 100.0 | 61.90 | [58.3,65.0] | 35.01 | 11.17 | 268.08 |
SR | NTCP | Dose | Fraction | Duration | |||
---|---|---|---|---|---|---|---|
() | () | (%) | (%) | (Gy) | (-) | (h) | |
Rectum | 0.065 | 0.005 | 0.0 | 32.02 | 63.06 | 42.46 | 1019.04 |
0.265 | 0.054 | 100.0 | 0.003 | 32.12 | 9.84 | 236.16 | |
0.465 | 0.103 | 100.0 | 16.19 | 4.86 | 116.64 | ||
Head and neck | 0.25 | 0.025 | 100.0 | 62.63 | 48.95 | 14.71 | 353.04 |
0.28 | 0.028 | 100.0 | 43.93 | 39.78 | 11.87 | 284.88 | |
0.33 | 0.033 | 100.0 | 30.96 | 33.73 | 10.06 | 241.44 | |
Lung | 0.25 | 0.025 | 100.0 | 97.65 | 50.54 | 15.77 | 378.48 |
0.28 | 0.028 | 100.0 | 91.04 | 42.76 | 13.37 | 320.88 | |
0.30 | 0.030 | 100.0 | 82.12 | 38.03 | 11.91 | 285.84 |
SR | NTCP | Dose | Fraction | Duration | |||
---|---|---|---|---|---|---|---|
(Gy) | (Gy) | (%) | (%) | (Gy) | (-) | (h) | |
Rectum | 0.065 | 0.005 | 0.0 | 58.07 | 74.54 | 44.53 | 1068.7 |
0.265 | 0.054 | 100.0 | 0.047 | 29.96 | 8.78 | 210.72 | |
0.465 | 0.103 | 100.0 | 16.45 | 5.56 | 133.44 | ||
Head and neck | 0.25 | 0.025 | 100.0 | 66.56 | 50.81 | 15.93 | 382.32 |
0.28 | 0.028 | 100.0 | 46.71 | 41.14 | 12.47 | 299.28 | |
0.33 | 0.033 | 100.0 | 27.91 | 31.93 | 9.64 | 231.36 | |
Lung | 0.25 | 0.025 | 100.0 | 96.05 | 49.24 | 15.18 | 364.32 |
0.28 | 0.028 | 100.0 | 86.83 | 40.85 | 12.75 | 306.00 | |
0.30 | 0.030 | 100.0 | 80.32 | 36.83 | 11.61 | 278.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ghislain, M.; Martin, F.; Dausort, M.; Dasnoy-Sumell, D.; Barragan Montero, A.M.; Macq, B. Optimal Fractionation Scheduling for Radiotherapy Treatments with Reinforcement Learning, Tumor Growth Modeling and Outcome Modeling. Biomedicines 2025, 13, 1367. https://doi.org/10.3390/biomedicines13061367
Ghislain M, Martin F, Dausort M, Dasnoy-Sumell D, Barragan Montero AM, Macq B. Optimal Fractionation Scheduling for Radiotherapy Treatments with Reinforcement Learning, Tumor Growth Modeling and Outcome Modeling. Biomedicines. 2025; 13(6):1367. https://doi.org/10.3390/biomedicines13061367
Chicago/Turabian StyleGhislain, Mélanie, Florian Martin, Manon Dausort, Damien Dasnoy-Sumell, Ana Maria Barragan Montero, and Benoît Macq. 2025. "Optimal Fractionation Scheduling for Radiotherapy Treatments with Reinforcement Learning, Tumor Growth Modeling and Outcome Modeling" Biomedicines 13, no. 6: 1367. https://doi.org/10.3390/biomedicines13061367
APA StyleGhislain, M., Martin, F., Dausort, M., Dasnoy-Sumell, D., Barragan Montero, A. M., & Macq, B. (2025). Optimal Fractionation Scheduling for Radiotherapy Treatments with Reinforcement Learning, Tumor Growth Modeling and Outcome Modeling. Biomedicines, 13(6), 1367. https://doi.org/10.3390/biomedicines13061367