Analysing Tumour Growth Delay Data from Animal Irradiation Experiments with Deviations from the Prescribed Dose

The development of new radiotherapy technologies is a long-term process, which requires proof of the general concept. However, clinical requirements with respect to beam quality and controlled dose delivery may not yet be fulfilled. Exemplarily, the necessary radiobiological experiments with laser-accelerated electrons are challenged by fluctuating beam intensities. Based on tumour-growth data and dose values obtained in an in vivo trial comparing the biological efficacy of laser-driven and conventional clinical Linac electrons, different statistical approaches for analysis were compared. In addition to the classical averaging per dose point, which excludes animals with high dose deviations, multivariable linear regression, Cox regression and a Monte-Carlo-based approach were tested as alternatives that include all animals in statistical analysis. The four methods were compared based on experimental and simulated data. All applied statistical approaches revealed a comparable radiobiological efficacy of laser-driven and conventional Linac electrons, confirming the experimental conclusion. In the simulation study, significant differences in dose response were detected by all methods except for the conventional method, which showed the lowest power. Thereby, the alternative statistical approaches may allow for reducing the total number of required animals in future pre-clinical trials.


Introduction
The development of new radiotherapy (RT) beam delivery techniques, e.g., laser driven particle acceleration [1], micro beam RT [2] or ultra-high dose rate irradiation (FLASH) [3], is a long-term process, where the general concept should be proven early on, even though clinical requirements, such as a stable dose delivery, are not yet fulfilled. In particular, with regard to the later application in RT, the respective concepts should be tested in a translational manner [4] to validate their ability of tumour killing and the effects on the surrounding normal tissue. Starting with physical optimization and in vitro experiments, a successful concept will be validated by in vivo trials before considering it for clinical application. Regarding the requirements of stability and reproducibility of beam parameters,

Results
The results of the different statistical methods for analysing experimental and simulated tumour-growth data are summarized in Tables 1 and 2, respectively. For the experimental data, no significant differences between the beam qualities were observed by any of the statistical methods (Table 1), as reported previously [12]. The tumour-growth time to reach the sevenfold volume (t V7 ) in the 3 Gy group was somewhat longer after irradiation with laser-driven electrons (mean 16.46 days) than after irradiation with the Linac (mean 13.88 days). However, this difference was not statistically significant, as estimated by the conventional method (p = 0.14) and by the Monte-Carlo-based method (p = 0.18). Figure 1a shows the experimental data and the corresponding linear regression lines, which are close to each other. In addition to the experimental data, a simulated dataset was generated, in which the laser-driven irradiation led to a dose-response slope that was 50% larger than that of the Linac (Figure 1b). Results of the statistical tests are presented in Table 2. The difference in the slope of the dose response between the groups was not detected using the conventional method (p = 0.15). All other statistical methods, however, were able to identify this difference. The Monte-Carlo-based method identified a difference in the slope between 0 Gy and 3 Gy (p = 0.012). In linear regression, the differing dose response was reflected by the significant interaction term b DoseGroup (p = 0.006) and by the overall R 2 test (p = 0.001). Similar results were obtained using Cox regression, where the differing dose-response relationship was reflected by the interaction term β DoseGroup that was significantly different from zero (p = 0.049) and by the overall likelihood-ratio test (p = 0.010).
The presented simulation was repeated 10000 times with different randomly chosen dose and t V7 values. Overall, the power to detect the existing difference between the groups was only 42% for the conventional method, while the Monte-Carlo-based method reached a higher power of 75%, linear regression achieved a power of 93% and Cox regression of 87%, Table 2.  Experimental (a) and simulated (b) tumour growth data, i.e., time to achieve sevenfold relative volume increase (tV7), and the corresponding linear regressions for treatment with laserdriven (black squares) and Linac electrons (blue triangles). For the experimental data, the dose region useable for conventional analysis is marked in grey. Therefore, black squares outside the grey area mark mice, which were not included in the conventional analysis.

Discussion
The starting point for the statistical analyses performed in this manuscript was the substantial exclusion rate of animals from the analysis of a treatment comparison study [12] due to deviations from the prescribed dose. Following the translational chain from bench to bedside, the radiobiological effectivity of laser-driven electrons and conventional Linac electrons was compared to reveal potential pitfalls of the new acceleration regime. Although the campaign itself was performed successfully, 43% of all animals treated with laser-driven electrons had to be excluded from analysis due to deviations of more than 10% from the prescribed radiation dose. This lowers the statistical power to reveal a significant difference between the beam qualities.
Conventionally, growth data from xenograft subcutaneous tumours were obtained for dedicated, pre-defined treatment groups [13][14][15][16][17] with a sufficient number of animals. Since the allocation in different groups took place before treatment, deviations from the treatment schedule and censoring of animals during follow-up must be taken into account. The latter is considered both in planning of an animal trial and in analysis using approaches that allow for handling censored tumour growth data [13,18,19]. At conventional accelerators, like X-ray tubes or clinical Linacs, deviations from the scheduled treatment regime are very rare. Hence, there was no standard approach available that can handle data with substantial dose deviations as occur, e.g., at the experimental laser-driven accelerator considered here. In a similar experiment with pulsed proton Figure 1. Experimental (a) and simulated (b) tumour growth data, i.e., time to achieve sevenfold relative volume increase (t V7 ), and the corresponding linear regressions for treatment with laser-driven (black squares) and Linac electrons (blue triangles). For the experimental data, the dose region useable for conventional analysis is marked in grey. Therefore, black squares outside the grey area mark mice, which were not included in the conventional analysis.

Discussion
The starting point for the statistical analyses performed in this manuscript was the substantial exclusion rate of animals from the analysis of a treatment comparison study [12] due to deviations from the prescribed dose. Following the translational chain from bench to bedside, the radiobiological effectivity of laser-driven electrons and conventional Linac electrons was compared to reveal potential pitfalls of the new acceleration regime. Although the campaign itself was performed successfully, 43% of all animals treated with laser-driven electrons had to be excluded from analysis due to deviations of more than 10% from the prescribed radiation dose. This lowers the statistical power to reveal a significant difference between the beam qualities.
Conventionally, growth data from xenograft subcutaneous tumours were obtained for dedicated, pre-defined treatment groups [13][14][15][16][17] with a sufficient number of animals. Since the allocation in different groups took place before treatment, deviations from the treatment schedule and censoring of animals during follow-up must be taken into account. The latter is considered both in planning of an animal trial and in analysis using approaches that allow for handling censored tumour growth data [13,18,19]. At conventional accelerators, like X-ray tubes or clinical Linacs, deviations from the scheduled treatment regime are very rare. Hence, there was no standard approach available that can handle data with substantial dose deviations as occur, e.g., at the experimental laser-driven accelerator considered here. In a similar experiment with pulsed proton beams, Zlobinskaya et al. [20] circumvented the grouping problem by analysing the growth time for each of the treated animals individually.
To improve the analysis of animal trials at experimental radiation sources, in this manuscript, different statistical approaches were compared based on the data published by Oppelt et al. [12]. We considered the conventional method, including only animals with an applied dose close to the prescribed dose and as alternatives a Monte-Carlo-based method, linear regression and Cox regression, which allow for including animals with applied doses that strongly deviate from the prescribed dose. As for the previous publication of Oppelt et al. [12] no significant differences between Linac and laser-accelerated electron treatment were obtained for the time to sevenfold tumour growth (t V7 ) regardless the method applied. Also for other endpoints, i.e., t V3 , t V5 and t V10 , a dose response similar to Figure 1a with no significant difference between the radiation sources was observed and the variability between individual mice was similar to t V7 . The large variability of the tumour growth data (Figure 1a) complicates the comparison of the two treatment regimens and the detection of significant differences between the radiation modalities. With respect to this large variability, one may question the previously applied threshold of the conventional method excluding animals with more than 10% deviation from the prescribed dose. Increasing this threshold would improve the power of the conventional method, while the precise assignment of mice to particular dose groups would be lost.
Compared to patient treatment with large inter-patient variability, in preclinical experiments aiming on the comparison of treatment regimes, one tries to minimize the variability as much as possible. For the experiment described in Oppelt et al. [12], a previously established protocol was applied, which used mice of a strain with defined immune status, age and radiation doses, showing measurable differences between dose groups. However, despite standardized handling procedures, subtle changes, i.e., in the number of tumour cells inoculated in the mice [12], in the position of inoculation, in the stress status of the individual mouse etc., might result in variations in the tumour radiation response, as visible in Figure 1a. This biological variability is hard to predict and must be taken into account by a sufficient number of animals per group and improved tumour models [16].
In order to reveal advantages of the applied statistical methods, a dataset was simulated in which laser-driven electrons deliberately led to a steeper dose-response curve than Linac electrons. Overall, the conventional method, excluding animals with doses too far from the prescribed dose, was able to detect the differing dose response only with a probability of 42%, while the other methods showed a power of more than 75%. The highest power was reached by linear regression, which is due to the assumed linear dose-response relationship in the simulation. It is expected that for a non-linear dose response, the power of linear regression would decrease, while the power of the nonparametric Monte-Carlo-based method would remain stable.
The considered statistical methods have different advantages and disadvantages: The conventional method applies t-tests for every dose level. In addition to the reduced power due to large drop-out, repeated testing generates a multiple testing problem and applying corresponding corrections would further reduce the power. However, the conventional method does not use a specific assumption on the dose-response relationship, as is required for the regression approaches. The developed Monte-Carlo-based method is similar to the conventional method. It includes all data, does not assume a dose-response relationship and analyses the change in response in neighbouring dose groups. However, in contrast to the other methods, it is not available in standard statistical software and first has to be implemented. While linear regression assumes a linear dose response, Cox regression models the hazard function of tumour recurrence that is related to the tumour control probability using a linear dose dependency. It is also able to handle animals that do not reach the considered endpoint as censored observations, thereby further increasing the sample size. More advanced models like linear mixed models or time-dependent Cox regression also with non-linear terms may be considered as further alternatives [21].

Experimental Data
In the experiment described by Oppelt et al. [12], tumour-bearing mice were irradiated with a prescribed dose of 0, 3 or 6 Gy. After electron treatment, either with laser-driven or conventional Linac electrons, the tumour growth was followed up to a predefined final size. The tumour growth time was determined by the time needed to develop a tumour size, which is a multiple of the size at irradiation. In this manuscript, the time for observing a sevenfold volume, t V7 , was considered and analysed by different statistical approaches.
Animal numbers are summarised in Table 3. There were 47 mice irradiated at the laser accelerator and 27 mice irradiated at the Linac with a prescribed dose of 3 Gy or 6 Gy. Additionally, 41 mice at the laser accelerator and 20 mice at the Linac were used as controls (0 Gy), which were not irradiated but handled in the same way. In Oppelt et al. [12], 20 of the 47 mice irradiated at the laser accelerator were excluded (43%). The detailed experimental data, comprising treatment doses and the corresponding tumour growth data for the two radiation qualities are tabularized in the Supplement (Tables S1 and S2). Table 3. Overview of the animals allocated and finally analysed for the electron irradiation experiments described in Oppelt et al. [12] and in this manuscript.

Simulated Data
In the experimental data, no difference between laser-driven and conventional Linac electrons was observed. Thus, to reveal advantages of the applied statistical methods, we simulated dose-response data in which we deliberately included a different dose-response relationship for both groups. For simulating the laser-irradiated mice, 40 data points were generated in three steps. First, a dose D was chosen as uniformly distributed in the interval [1,8] Gy. The tumour growth time was then estimated by a normal distribution with mean µ and standard deviation σ according to and The given parameters of Equations (1) and (2) were chosen such that the simulated data distribution resembled the experimental data. For 20 control animals, the dose 0 Gy was used.
The process of generating the simulated data for the mice irradiated by the Linac was similar. It differed in the distribution of dose D, using the fixed dose values of 0 Gy, 3 Gy and 6 Gy, including 20 data points for every dose group. Furthermore, the slope of the dose response was reduced, i.e., Equation (1) was replaced by: Results of the statistical methods are presented for one particular example in Table 2. Furthermore, the outlined procedure was repeated 10000 times using different random realizations. For every statistical method, the power to detect the difference in the dose-response relationship was calculated as the ratio of the number of significant results divided by 10000. Methods with a high power should be preferred.

Methods for Analysing Tumour Growth Data
The statistical tests described in the following were performed using SPSS Statistics version 25 (IBM Corporation, Armonk, NY, USA), while Monte-Carlo sampling was performed in Python (Python Software Foundation, Python Language Reference, version 3.6). In this manuscript, p-values smaller than 0.05 were considered as statistically significant.

Conventional Analysis
For calculation of the mean tumour growth time per dose group, animals with applied doses not too far from the prescribed dose were included, leading to the 0 Gy, 3 Gy and 6 Gy dose groups. Tumours with more than 10% deviation in the absolute dose were excluded from analysis, i.e., the dose had to be within the intervals of [2.7; 3.3] Gy or [5.4; 6.6] Gy, respectively. After that selection, the mean t V7 of the selected N mice and its standard deviation were calculated for every dose group. The data of the two beam qualities were then compared by a two-sided t-test with Welch correction.

Monte-Carlo-Based Method
In contrast to the conventional analysis, the Monte-Carlo-based method includes all irradiated animals, but it does not require the explicit specification of a regression function as the following alternatives methods. The Monte-Carlo-based method is visualized in Figure 2. Data points (D i , t V7,i ) for each beam quality were divided into three groups according to the applied dose, named "0 Gy", "3 Gy" and "6 Gy". In general, every combination of two data points belonging to two different dose groups defines a linear function with a slope parameter A and a constant n. We randomly selected as many pairs as there were data points in the dose group with the smallest sample size and calculated A and n for every pair. Subsequently we calculated the difference of the mean slopes, ∆A = A Linac − A Laser , and constants, ∆n = n Linac − n Laser , between the beam qualities. This bootstrap procedure was repeated 10000 times for pairs based on data points from the 0 Gy and 3 Gy groups and for pairs based on data from the 3 Gy and 6 Gy groups. A significant difference in slope or constant between the beam qualities was observed if the central 95% range of the corresponding distributions did not include the value 0. To increase robustness, pairs with too small dose differences were excluded using a minimally allowed difference of 0.5 Gy.
Cancers 2019, 11, x FOR PEER REVIEW 8 of 11 The statistical tests described in the following were performed using SPSS Statistics version 25 (IBM Corporation, Armonk, NY, USA), while Monte-Carlo sampling was performed in Python (Python Software Foundation, Python Language Reference, version 3.6). In this manuscript, p-values smaller than 0.05 were considered as statistically significant.

Conventional Analysis
For calculation of the mean tumour growth time per dose group, animals with applied doses not too far from the prescribed dose were included, leading to the 0 Gy, 3 Gy and 6 Gy dose groups. Tumours with more than 10% deviation in the absolute dose were excluded from analysis, i.e., the dose had to be within the intervals of [2.7; 3.3] Gy or [5.4; 6.6] Gy, respectively. After that selection, the mean tV7 of the selected N mice and its standard deviation were calculated for every dose group. The data of the two beam qualities were then compared by a two-sided t-test with Welch correction.

Monte-Carlo-Based Method
In contrast to the conventional analysis, the Monte-Carlo-based method includes all irradiated animals, but it does not require the explicit specification of a regression function as the following alternatives methods. The Monte-Carlo-based method is visualized in Figure 2. Data points (Di, tV7,i) for each beam quality were divided into three groups according to the applied dose, named "0 Gy", "3 Gy" and "6 Gy". In general, every combination of two data points belonging to two different dose groups defines a linear function with a slope parameter A and a constant n. We randomly selected as many pairs as there were data points in the dose group with the smallest sample size and calculated A and n for every pair. Subsequently we calculated the difference of the mean slopes, ΔA = A̅ Linac − A̅ Laser, and constants, Δn = n̅ Linac -n̅ Laser, between the beam qualities. This bootstrap procedure was repeated 10000 times for pairs based on data points from the 0 Gy and 3 Gy groups and for pairs based on data from the 3 Gy and 6 Gy groups. A significant difference in slope or constant between the beam qualities was observed if the central 95% range of the corresponding distributions did not include the value 0. To increase robustness, pairs with too small dose differences were excluded using a minimally allowed difference of 0.5 Gy. The grouping of the experimental data from the Linac as well as of the simulated Linac data, was based on the prescribed dose, which was close to the applied dose (0 Gy, 3 Gy or 6 Gy). For the experimental laser-driven electron data, the dose was prescribed accordingly (0 Gy, 3 Gy and 6 Gy), but may differ from the applied dose, which is used for the calculation. Here the groups were defined by the intervals of [1.0; 4.5) Gy and [4.5; 8.0] Gy containing 26 and 21 data points, respectively. This definition was also used for the simulated laser-driven irradiations.

Linear Regression
This method is based on the assumption of a linear dependency of the tumour-growth time on the applied dose, which was observed, e.g., in an independent experiment with the same tumour model [16] using irradiation with 200 kV X-rays. The high-energy electrons may exhibit the same dose-effect relationship, because it is also a beam quality with low linear energy transfer. The grouping of the experimental data from the Linac as well as of the simulated Linac data, was based on the prescribed dose, which was close to the applied dose (0 Gy, 3 Gy or 6 Gy). For the experimental laser-driven electron data, the dose was prescribed accordingly (0 Gy, 3 Gy and 6 Gy), but may differ from the applied dose, which is used for the calculation. Here the groups were defined by the intervals of [1.0; 4.5) Gy and [4.5; 8.0] Gy containing 26 and 21 data points, respectively. This definition was also used for the simulated laser-driven irradiations.

Linear Regression
This method is based on the assumption of a linear dependency of the tumour-growth time on the applied dose, which was observed, e.g., in an independent experiment with the same tumour model [16] using irradiation with 200 kV X-rays. The high-energy electrons may exhibit the same dose-effect relationship, because it is also a beam quality with low linear energy transfer.
Linear regression was performed using where t V7 (D) are the t V7 derived for individual tumours treated with a certain dose D, and the parameters b 0 and b Dose were fit to the data. The fit was performed for both beam qualities individually, including all irradiated animals, as shown in Figure 1. The quality of the fit can be measured by the coefficient of determination R 2 .
To compare the slopes and intercepts of the regression lines between the radiation qualities, the following multivariable model was applied, where the group variable G represents the radiation quality and b are the fit coefficients. Here, we set G = 0 for tumours treated with laser-accelerated electrons and G = 1 for those treated with electrons delivered by a clinical Linac. The comparison of the two beam qualities was evaluated by the p-values corresponding to b Group and b DoseGroup . If b Group significantly differs from 0, a global shift between the growth times of both radiation qualities is observed, while a parameter b DoseGroup significantly different from 0 describes differing slopes of the dose-effect curves. As an alternative, the change in R 2 between a model with b Group = b DoseGroup = 0 and a model including all fit parameters can be tested for a significant difference from 0 using an F-test.

Cox Regression
Cox regression can handle censored data, i.e., include mice that did not reach the endpoint of achieving a sevenfold volume increase. First, the hazard function given by: was fitted for the two beam qualities individually. It consists of a time-dependent baseline hazard h 0 (t) that is not estimated, and a time-independent factor containing the regression coefficient β Dose. In a second step, multivariable Cox regression was performed by: where G is the same group parameter as used for the multivariable linear regression. A global difference between the groups is obtained if β Group significantly differs from zero, while a significant parameter β DoseGroup describes a dose-group interaction. As an alternative, a likelihood-ratio test can be performed using the difference in twice the negative log-likelihood between the model with β Group = β DoseGroup = 0 and the model including all fit parameters.

Conclusions
In this work, we applied different statistical approaches to identify differences between the time to sevenfold tumour volume increase after treatment of mice with laser-driven and conventional Linac electrons. Except for the conventional approach, these methods allow for including animals with applied doses considerably differing from the prescribed dose. Overall, a Monte-Carlo-based method, linear regression and Cox regression were more sensitive than the previously used conventional method and may be applied in future studies.
Funding: This research was funded by German Government, grant numbers 03ZIK445, 03Z1N511 and 03Z1O511.