Substantial Sparing of Organs at Risk with Modern Proton Therapy in Lung Cancer, but Altered Breathing Patterns Can Jeopardize Target Coverage

Simple Summary Treatment of locally advanced non-small cell lung cancer (LA-NSCLC) is a fine balance between toxicity and cure. Modern proton therapy might offer a more gentle radiation treatment compared to state-of-the-art photon radiotherapy, but is also more susceptible to the influence of breathing motion and anatomical changes. In this study, the influence of such uncertainties on treatment delivery was thoroughly investigated. Modern proton therapy did indeed show potential to reduce the risk of toxicity for the heart and lungs. This potential was maintained under the influence of anatomical and delivery uncertainties. However, changes in breathing motion jeopardized the target dose distribution in a subset of patients. We therefore recommend imaging at onset or early in treatment to recognize these patients and adapt the treatment. Abstract Enhancing treatment of locally advanced non-small cell lung cancer (LA-NSCLC) by using pencil beam scanning proton therapy (PBS-PT) is attractive, but little knowledge exists on the effects of uncertainties occurring between the planning (Plan) and the start of treatment (Start). In this prospective simulation study, we investigated the clinical potential for PBS-PT under the influence of such uncertainties. Imaging with 4DCT at Plan and Start was carried out for 15 patients that received state-of-the-art intensity-modulated radiotherapy (IMRT). Three PBS-PT plans were created per patient: 3D robust single-field uniform dose (SFUD), 3D robust intensity-modulated proton therapy (IMPT), and 4D robust IMPT (4DIMPT). These were exposed to setup and range uncertainties and breathing motion at Plan, and changes in breathing motion and anatomy at Start. Target coverage and dose-volume parameters relevant for toxicity were compared. The organ at risk sparing at Plan was greatest with IMPT, followed by 4DIMPT, SFUD and IMRT, and persisted at Start. All plans met the preset criteria for target robustness at Plan. At Start, three patients had a lack of CTV coverage with PBS-PT. In conclusion, the clinical potential for heart and lung toxicity reduction with PBS-PT was substantial and persistent. Altered breathing patterns between Plan and Start jeopardized target coverage for all PBS-PT techniques.


Introduction
State-of-the-art treatment for inoperable locally advanced non-small cell lung cancer (LA-NSCLC) is concurrent chemotherapy and intensity-modulated photon radiotherapy (IMRT) to a dose of 60 Gy. Still, 5-year survival rates for stage III disease are only around 30%, and side effects from treatment are common and potentially fatal, limiting the possibility for dose escalation with IMRT [1,2].
Proton therapy (PT) has advantageous depth dose characteristics with the potential to reduce side effects and facilitate dose escalation in LA-NSCLC patients [3][4][5][6][7]. Although phase II clinical trials have been promising [8][9][10], PT showed no advantage over IMRT in a randomized trial by Liao and colleagues [11,12]. These early clinical trials have mainly applied passive scattering PT, but state-of-the-art PT uses pencil beam scanning (PBS), allowing more conformal dose distributions with lower doses to critical organs [3,13].
PBS-PT is, however, not straightforwardly delivered in the thoracic region due to the inherent sensitivity to uncertainties [14,15]. Much concern has been dedicated to how the breathing motion of the primary tumor can interplay with PBS-PT spot delivery [16,17]. PBS-PT has therefore mainly been offered to patients with limited breathing motion [14]. However, recent studies with PBS-PT confirm that interplay uncertainties are canceled out by fractionation-as for IMRT-and more attention should be focused on changes in breathing patterns and anatomy [15,[18][19][20][21].
Various optimization techniques for PBS-PT exist, and it is believed that these are differently influenced by uncertainties [14,16]. With single-field uniform dose (SFUD), each field delivers a uniform dose to the entire target volume, while intensity-modulated proton therapy (IMPT) contains non-uniform dose distributions for the individual fields. 3D and 4D robust optimization can be applied to account for uncertainties due to patient setup and proton range as well as breathing motion, respectively [22][23][24].
In theory, IMPT has the potential to produce the most conformal treatment plans, while SFUD and 4D robust optimization are strategies to increase robustness [17]. There is, however, limited knowledge on how different robustly optimized PBS-PT techniques perform in practice for LA-NSCLC patients, as the planning CT is commonly used for both optimization and evaluation [16]. We see a need to investigate this in order to guide the use of PBS-PT, balancing organ at risk (OAR) sparing and robustness for both targets and OARs. Furthermore, little knowledge exists on the dosimetric advantages of PBS-PT compared to state-of-the-art IMRT at the start of treatment. Comparisons for, e.g., patient selection between protons and photons are also usually carried out on the planning scan, even though it is known that robustness towards changes matters [25]. A few studies have focused on the impact of anatomical changes occurring during the six weeks of treatment that should be handled by means of adaptive radiotherapy (ART) [26][27][28].
The purpose of this prospective simulation study was to compare 3D robust SFUD, 3D robust IMPT and 4D robust IMPT in terms of target coverage and OAR sparing under the influence of setup and range uncertainties, breathing motion and interplay at planning, as well as changes in the breathing motion pattern and anatomy from the planning to the start of treatment. Further, using the clinical IMRT plan as a reference, our objective was to evaluate if the potential for OAR sparing expected at planning was persistent at the start of treatment.

Patient Material and Clinical IMRT Planning
Fifteen consecutive patients with stage III NSCLC receiving radiochemotherapy with curative intent at Haukeland University Hospital in Bergen, Norway, in 2019-2020 were prospectively included in an in silico simulation study. All patients gave informed consent, and the study was approved by the regional committee for medical and health research ethics (protocol code 2019/749).
Imaging was performed on a Big Bore CT scanner (Philips Healthcare, Best, The Netherlands), using a Posirest-2 support device (Civco Radiotherapy, Coralville, IA, USA) for fixation in the supine position with arms resting above the head, and the Philips bellows device for registration of the breathing curve.
4DCTs with 10 respiratory phases and deep inspiration breath hold (DIBH) CTs were acquired at planning (Plan) and at the start of treatment (fraction 2 or 3; Start). The average intensity projection (AIP) of the 4DCT was used for delineation and treatment planning. Gross tumor volumes (GTVs) for the primary tumor and lymph nodes were defined on the AIP, based on a diagnostic CT with intravenous contrast, an FDG-PET-CT, and biopsy of mediastinal lymph nodes. To define the internal GTVs (IGTVs), each 4DCT phase was blended with the AIP, and the structure was expanded to include the GTV positions on all phases. Exceptions from this were three patients treated in DIBH due to lung dose exceeding the constraints or large tumor motion blurring the 4DCT. In these cases, IGTV delineation included the GTV on three consecutive planning DIBH scans. A clinical target volume (CTV) was created using a 5 mm margin from the IGTV, without extending into uninvolved organs such as bone, heart, esophagus and major vessels. The GTVs and CTV were deformably mapped to all phases of the 4DCT and later used in 4DIMPT optimization. For clinical IMRT planning, a planning target volume (PTV) with 5 mm margin from the CTV was used. Target delineation was performed by the same oncologist (I.M.S.) on all Plan and Start scans. The lungs, heart, esophagus, spinal canal and, if relevant, the brachial plexus were delineated according to RTOG guidelines [29].
Clinical treatment planning was performed in Eclipse v. 15.6 (Varian Medical Systems, Palo Alto, CA, USA). All patients received IMRT with a prescribed dose of 60 or 66 Gy in 2 Gy fractions, depending on lung function, lung dose and proximity of the brachial plexus to the PTV. The beam configuration was adjusted to fit the anatomy of each patient, mainly using six beams and avoiding entry through the contralateral lung. For the PTV, D 98% > 95% of prescribed dose was required, and the maximum dose in the plan should be <107%. Dose constraints for OARs are listed in Table A1. The Acuros External Beam algorithm was used for dose calculation, and the plans were normalized to the median dose in the PTV.
The motion amplitude of the primary tumor at Plan and Start was evaluated in Eclipse, using deformable mapping of the primary tumor GTV from the AIP to each breathing phase of the 4DCT and measuring the motion of the GTV center of mass in all directions.
The AIP of the 4DCT acquired at Start was rigidly matched to the AIP at Plan using six degrees of freedom and a volume of interest covering the PTV, as well as skeletal structures and the body contour in proximity to the PTV.

Proton Therapy Planning
Proton planning was performed in RayStation v. 8B (RaySearch Laboratories, Stockholm, Sweden). To ensure high plan quality, all plans were made by an experienced planning expert within photon therapy and comparative proton planning (C.G.B.) and reviewed by an experienced medical physicist (M.U.). The 4DCT phases were deformably registered to their respective AIP, and the deformed target volumes (GTVs and CTVs) and OARs were mapped onto each phase. For the AIP scan, a density override representative for tumor tissue (1.06 g/cm 3 ,~40 HU) was used for all plans for the IGTV. For the 4DCT phases, the original density values were applied (i.e., no density override).
For each patient, three PBS-PT treatment plans were created on the Plan AIP using different optimization techniques: SFUD, 4DIMPT and IMPT. 3D robust optimization according to the minimax approach with setup uncertainty of 5 mm in each direction and 3.5% range uncertainty (21 scenarios) was used for SFUD and IMPT [22]. 4D robust optimization, applying the same settings for setup and range uncertainty on all 4DCT breathing phases (231 scenarios), was used for 4DIMPT [23]. In 3D and 4D robust optimization, the reference plan is evaluated in each uncertainty scenario, and in each iteration, the scenario with the currently worst objective value is improved. 3D and 4D robust optimization were applied for the CTV, and for the spinal canal if close to the CTV. Rescanning methods were not used.
Each plan had two (10 patients) or three (5 patients) coplanar fields with gantry angles carefully selected with regard to the patient anatomy and the distance between the beam entry and the CTV ( Figure A1). For each patient, beam angles were individually selected, and the same field setup was used in the three PT plans. Range shifters of 4 cm or 7.5 cm were used for all fields, and the air gaps were 5-12 cm from the body contour depending on beam angles and risk of collision. Sigma of spot sizes in air at isocenter (without range shifter) were 3.7 to 7.2 mm depending on energy.
The same prescription as in the clinical plan was used, applying a relative biological effectiveness of 1.1 for protons. A generic IBA beam model was used for planning, and a Monte Carlo algorithm was used for dose calculation (using 0.5% statistical uncertainty).

Robustness Evaluation
An overview of the acquired image data and the robustness evaluation is shown in Figure 1. Robustness towards setup and range variations (Plan S/R) was evaluated on the Plan AIP using combined isocenter shifts of 2.9 mm in 3 directions simultaneously (corresponding to 5 mm isotropic shifts) and 3.5% range uncertainty (16 scenarios).
robust optimization were applied for the CTV, and for the spinal canal if close to the CTV. Rescanning methods were not used.
Each plan had two (10 patients) or three (5 patients) coplanar fields with gantry angles carefully selected with regard to the patient anatomy and the distance between the beam entry and the CTV ( Figure A1). For each patient, beam angles were individually selected, and the same field setup was used in the three PT plans. Range shifters of 4 cm or 7.5 cm were used for all fields, and the air gaps were 5-12 cm from the body contour depending on beam angles and risk of collision. Sigma of spot sizes in air at isocenter (without range shifter) were 3.7 to 7.2 mm depending on energy.
The same prescription as in the clinical plan was used, applying a relative biological effectiveness of 1.1 for protons. A generic IBA beam model was used for planning, and a Monte Carlo algorithm was used for dose calculation (using 0.5% statistical uncertainty).

Robustness Evaluation
An overview of the acquired image data and the robustness evaluation is shown in Figure 1. Robustness towards setup and range variations (Plan S/R) was evaluated on the Plan AIP using combined isocenter shifts of 2.9 mm in 3 directions simultaneously (corresponding to 5 mm isotropic shifts) and 3.5% range uncertainty (16 scenarios). Robustness towards breathing motion at Plan (Plan CT0/50) was evaluated by recalculating all PT plans on the extreme breathing phases of the 4DCT: CT0 (maximum inspiration) and CT50 (maximum expiration). In addition, interplay evaluation was performed at Plan (Plan Interplay) using a script provided by RaySearch. The 10 breathing phases of the 4DCT were in turn used as the starting phase for treatment delivery, and the spots were distributed on the CTs of the different phases based on delivery time and breathing cycle length. Constant breathing periods of five seconds were used for all patients. The dose on each phase was calculated and mapped to the reference image (AIP), where the total dose was calculated. This resulted in 10 different interplay dose Robustness towards breathing motion at Plan (Plan CT0/50) was evaluated by recalculating all PT plans on the extreme breathing phases of the 4DCT: CT0 (maximum inspiration) and CT50 (maximum expiration). In addition, interplay evaluation was performed at Plan (Plan Interplay) using a script provided by RaySearch. The 10 breathing phases of the 4DCT were in turn used as the starting phase for treatment delivery, and the spots were distributed on the CTs of the different phases based on delivery time and breathing cycle length. Constant breathing periods of five seconds were used for all patients. The dose on each phase was calculated and mapped to the reference image (AIP), where the total dose was calculated. This resulted in 10 different interplay dose distributions depending on which phase delivery started in. For all robustness simulations, reported values represent the worst-case scenario for each parameter.
All PT plans were also recalculated on the AIP (Start), CT0 and CT50 (Start CT0/50) of the Start 4DCT to evaluate robustness towards changes in breathing motion and anatomy that can occur between planning and onset of treatment.

Dosimetric Evaluation
Dose distributions at Plan were compared using D 98% and D 2% for the CTV, as well as the homogeneity index HI = (D 2% − D 98% )/D 50% , and the conformity index CI = (TV RI /TV) × (TV RI /V RI ), where TV is the target volume, TV RI is the target volume covered by the reference (95%) isodose and V RI is the volume of the reference isodose [30]. For healthy tissue and OARs, the following parameters relevant for toxicity were evaluated: D 2cc for the patient body, D mean , V 5Gy and V 20Gy for the lungs, D mean and V 30Gy for the heart, D mean for the esophagus and D max for the spinal canal.
For OARs, the planning criteria (Table A1) were also required in robustness evaluation. In addition, the D 2cc to the patient body should be <107% of the prescribed dose. CTV D 98 > 95% and CTV D 2% < 107% were required in setup and range and extreme phase evaluation as well as in the Start recalculations. The interplay effect is expected to cause under-and overdosage in the tumor and OARs that average out during fractionated treatment. Ensuring at least 1.8 Gy per fraction, i.e., CTV D 98 > 90%, and CTV D 2% and body D 2cc < 110% were considered acceptable in interplay evaluation.
A structured overview of the various evaluations and criteria is shown in Table A2. Initially, we present the target coverage and OAR sparing for the various techniques at Plan. Thereafter, we investigate the robustness of the target dose and OAR doses, respectively. For evaluation of the actual clinical potential of proton therapy compared to photon therapy, we lastly compare target coverage and OAR sparing at Start.

Statistical Analysis
Statistical analyses were performed in IBM SPSS Statistics (IBM Corp., Armonk, NY, USA). Friedman's test (non-parametric two-way analysis of variance by ranks) was used for comparison of the different techniques. Bonferroni correction was applied to adjust the p-value for multiple testing in post hoc analysis. A significance level of 0.05 was used.

Patient Characteristics and Breathing Motion
The median CTV volume was 137 cc (range 66-435 cc). Nine patients had disease stage IIIA, five IIIB and one IIIC. Primary tumor positions were left upper (3), left lower (4), right upper (4) and right lower (3) lobe. One patient only had mediastinal lymph nodes and one only had a primary tumor; the rest had both primary tumor and lymph nodes included in the target volume. The prescribed dose was 60 Gy for 6 patients and 66 Gy for 9 patients.
The breathing motion of the primary tumor was largest in the cranio-caudal direction, with a median amplitude of 4 mm and a maximum of 15 mm in the planning 4DCTs. Large variability in breathing motion was observed between patients. Six patients had a motion amplitude >5 mm, all in the cranio-caudal direction. Median breathing motion amplitudes were similar at Plan and Start (Table A3), and for most patients, the change in amplitude from Plan to Start was ≤2 mm in all directions. Three patients had a larger change in amplitude in the cranio-caudal direction (−6 mm, +4 mm and −3 mm), and these were also the three patients with the largest breathing motion amplitudes at Plan.

Target Coverage and OAR Sparing at Plan
All treatment plans achieved the required CTV D 98% > 95% and D 2% < 107% of the prescribed dose at Plan ( Table 1). The median CTV D 98% was, however, significantly higher for IMRT than for all PT techniques. The median PTV D 98% in the IMRT plans was 95.7% (range 94.6-97.0%). Healthy tissue and OAR doses were lower for all proton techniques than for IMRT (Table 1, Figure 2). The only exception was the D 2cc of the body, where IMRT gave the lowest dose. Among the PT techniques, significant differences were found between SFUD and IMPT in D mean for the lungs and esophagus and V 20Gy for the lungs, all in favor of IMPT. The mean rank was the worst with IMRT and the best with IMPT for all of the evaluated OAR parameters.

Target Dose Robustness at Plan and Start
All IMPT and 4DIMPT plans achieved the criteria for D 98% and D 2% for the CTV on setup and range evaluation and extreme phase evaluation at Plan (Figure 3). One SFUD plan narrowly failed with a D 98% of 94.8% on CT0. In interplay evaluations, all plans fulfilled the goal of D 98% > 90%. CTV D 2% slightly exceeded 107% in interplay evaluations of three plans, two of which were SFUD and one was 4DIMPT. Thus, all PBS-PT techniques had satisfying target robustness at Plan. The results from the extensive robustness evaluation at Plan and Start for CTV D 98% are shown in Figure 3, and a summary of D 98% and D 2% values for all evaluations of all proton techniques are listed in the Appendix A (Table A4).
For the Start AIP recalculation, the CTV D 98% was above 95% of the prescribed dose for 13/15 patients with SFUD and 4DIMPT and 12/15 with IMPT ( Figure 3). In general, the CTV D 98% decreased in extreme-phase evaluations, but for 11/15 patients, it was still above 95%, independent of the optimization technique. The differences in the median CTV D 98% between the PT techniques were small for both AIP and CT0/50 recalculations at Start (Figure 3, Table A4). It was, however, statistically significant between 4DIMPT and IMPT on CT0/50, in disfavor of IMPT. D 2% was similar and <107% for all plans on all scans.
One of the patients that stood out with insufficient CTV coverage at Start (AIP) was patient 3, with D 98% of 87.2%, 91.7% and 82.7% for SFUD, 4DIMPT and IMPT, respectively. This patient had a change in breathing pattern between Plan and Start, causing the CTV in the mediastinum to expand 15 mm caudally and 3 mm cranially ( Figure A2). Similar changes were seen for patients 11 and 15. The CTV coverage at Start for these patients was not sufficient with any PT optimization technique (Figure 3). For one of the patients with insufficient and one of the patients with sufficient CTV coverage at Start, dose distributions for all PT techniques are shown in Figure A3. For patient 8, the low CTV D 98% at the Start CT0/50 was likely caused by delineation uncertainty. The IMRT plans for patients 3 and 11 were planned and recalculated on DIBH CTs. The values for IMRT and PT techniques can therefore not be directly compared. filled the goal of D98% > 90%. CTV D2% slightly exceeded 107% in interplay evaluations of three plans, two of which were SFUD and one was 4DIMPT. Thus, all PBS-PT techniques had satisfying target robustness at Plan. The results from the extensive robustness evaluation at Plan and Start for CTV D98% are shown in Figure 3, and a summary of D98% and D2% values for all evaluations of all proton techniques are listed in the Appendix (Table  A4). For the Start AIP recalculation, the CTV D98% was above 95% of the prescribed dose for 13/15 patients with SFUD and 4DIMPT and 12/15 with IMPT ( Figure 3). In general, the CTV D98% decreased in extreme-phase evaluations, but for 11/15 patients, it was still above 95%, independent of the optimization technique. The differences in the median CTV D98% between the PT techniques were small for both AIP and CT0/50 recalculations at Start (Figure 3, Table A4). It was, however, statistically significant between 4DIMPT and IMPT on CT0/50, in disfavor of IMPT. D2% was similar and <107% for all plans on all scans.
One of the patients that stood out with insufficient CTV coverage at Start (AIP) was patient 3, with D98% of 87.2%, 91.7% and 82.7% for SFUD, 4DIMPT and IMPT, respectively. The acceptance criteria were D 98% > 90% for interplay evaluation and >95% for other robust evaluations. In cases where the criteria were not met, the patient number is given next to the observation in the figure. DIBH CT was used in IMRT for patients 3 and 11, hence planning and recalculation were not performed on the same scans as for the PT plans. CT0/50 includes two observations (both extreme phases) per patient. Boxplots show the median (line), mean (cross) and spread, with outliers as dots outside the box. Plan = planning CT, Start = start of treatment CT, CT0/50 = extreme phase evaluation, S/R = setup and range evaluation.

OAR Dose Robustness at Plan and Start
In setup and range and extreme-phase evaluations at Plan, both IMPT and 4DIMPT achieved the constraints for OARs (Table A1) in all plans. One of the SFUD plans failed in the setup and range evaluation, exceeding the D max criterion for the spinal canal with 52.0 Gy in the worst-case scenario. In interplay evaluation, 26 out of 45 plans had a D 2cc to the body >107%; however, only three plans exceeded 110% of the prescribed dose. Two of these were IMPT plans, and one was 4DIMPT. The OAR constraints were met for all patients and all techniques in interplay evaluation.
Relevant dose-volume parameters for OARs at Plan and Start for IMRT and all PT techniques are shown in Figure 2. The pattern of OAR sparing with PT compared to IMRT persisted at Start. Median changes in dose-volume parameters from Plan to Start were 6% or lower for all parameters and all techniques (Table A5). Nevertheless, large variations between patients in the relative change of dose-volume parameters (ranging from −58% to 103%) from Plan to Start were seen for individual patients with all techniques. For most of the patients, constraints were still achieved for all OARs. For one patient, the esophagus shifted towards the CTV, causing a~30% increase in mean dose to above 35 Gy for all techniques. Hotspots (D 2cc > 107%) to the healthy tissue occurred at Start with one IMRT plan and two SFUD plans.

Target Coverage and OAR Sparing at Start
Out of the PBS-PT techniques, IMPT showed the greatest potential for toxicity reduction. A comparison of all 105 OAR dose-volume parameters calculated at Start resulted in the best mean rank for IMPT (1.51), followed by 4DIMPT (2.06), SFUD (2.56) and IMRT (3.87), with all pairwise comparisons being significant. Figure 4 shows the per-patient advantage of IMPT in the sparing of OAR mean doses, as well as the price to pay in target coverage. The latter was, however, only significantly different between IMPT and IMRT, probably influenced by the use of DIBH for IMRT.   Substantial dose reductions were achieved with IMPT compared to state-of-the-art IMRT for the lungs, heart and spinal canal (Figures 4 and 5). For the lungs, the median D mean was reduced from 13.7 to 9.6 Gy, V 5Gy from 55.1 to 28.4% and V 20Gy from 23.4 to 18.6% with IMPT compared to IMRT. The median heart D mean was reduced from 8.2 to 3.0 Gy, with D mean < 10 Gy for all patients with IMPT and 10/15 with IMRT. The median heart V 30Gy was reduced from 8.3 to 3.6%, the median esophagus D mean was reduced from 20.1 to 18.1 Gy and the median spinal canal D max was reduced from 45.5 to 32.7 Gy. All differences were statistically significant, and among the 105 individual parameters compared, 102 were in favor of IMPT, 2 were in favor of IMRT, and 1 was tied.

Discussion
This study shows that the potential for OAR sparing with PBS-PT compared to stateof-the-art IMRT was substantial and persistent from the planning to the start of treatment. Among the various optimization techniques, IMPT spared OARs the most. There were surprisingly small differences between the PBS-PT techniques in the response to various uncertainties, but IMPT was slightly less robust towards breathing motion than 4DIMPT. All techniques were acceptable with respect to robustness evaluations at Plan, including interplay, and also at Start for the majority of patients. However, all robust optimization techniques failed to account for changes in breathing motion patterns occurring in three patients, causing unacceptable coverage of the mediastinal lymph nodes.
Given strategies to recognize patients with altered breathing motion and account for the lack of target robustness in these patients, we believe robustly optimized IMPT and 4DIMPT can reduce the risk of both radiation pneumonitis and heart toxicity compared

Discussion
This study shows that the potential for OAR sparing with PBS-PT compared to stateof-the-art IMRT was substantial and persistent from the planning to the start of treatment. Among the various optimization techniques, IMPT spared OARs the most. There were surprisingly small differences between the PBS-PT techniques in the response to various uncertainties, but IMPT was slightly less robust towards breathing motion than 4DIMPT. All techniques were acceptable with respect to robustness evaluations at Plan, including interplay, and also at Start for the majority of patients. However, all robust optimization techniques failed to account for changes in breathing motion patterns occurring in three patients, causing unacceptable coverage of the mediastinal lymph nodes.
Given strategies to recognize patients with altered breathing motion and account for the lack of target robustness in these patients, we believe robustly optimized IMPT and 4DIMPT can reduce the risk of both radiation pneumonitis and heart toxicity compared to IMRT. Lung D mean , V 5Gy and V 20Gy were all significantly reduced with IMPT and 4DIMPT, and these parameters have previously been correlated to the probability of radiation pneumonitis [31]. Interestingly, this could potentially be a key to better outcome as well, since patients with radiation pneumonitis have been excluded from adjuvant treatment with immune checkpoint inhibitors [32]. Note that SFUD did not reduce lung doses compared to IMRT in our study, and therefore we would not expect any reduced risk of pneumonitis with SFUD. The reductions in heart dose seen with all PBS-PT techniques are also likely clinically relevant. Atkins et al. showed that a heart D mean > 10 Gy significantly increased the risk of mortality in LA-NSCLC [33]. In our study, a mean dose to the heart below 10 Gy was achieved for 10/15 patients with IMRT and for all patients across all PBS-PT techniques.
Sparing of the spinal canal beyond the max dose constraint is not expected to give a clinical benefit in itself. However, with the large reduction seen with all PBS-PT techniques compared to IMRT, less effort must be spent on this highly prioritized constraint in the optimization, possibly giving room for the considerable dose reduction seen for other OARs.
Mean doses to the esophagus were slightly reduced with IMPT compared to IMRT in our study. It is unknown whether this would lead to a reduction in esophagitis [34], especially since there are additional uncertainties in elevated LET that were not considered in the current study. The esophagus is highly mobile and often located in close proximity to the target volume, and can move into the high-dose region. This was the case for one of the patients in our study. In a recent clinical dose-escalation study (including 47 patients with stage III NSCLC) by Iwata et al., ART was used to monitor the position of the esophagus and adjust treatment accordingly if needed [8]. Dose-escalated PT was well tolerated in this phase II study, with no grade ≥3 radiation pneumonitis and one case of acute grade 3 esophagitis. Additionally, the 5-year overall survival of 59% (probably influenced by combination with immunotherapy) shows promise. This study mainly used passive scattering PT, although some patients with small tumor motion had single-field optimized spot-scanning plans.
Recently, Ribeiro et al. published a comprehensive robustness analysis, including weekly imaging during treatment, for 10 stage III NSCLC patients with small to moderate tumor motion, showing the feasibility of PBS-PT in the majority of patients [27]. Our study strengthens these findings by confirming the results in an independent patient group with larger motion variability. Inoue et al. also investigated the robustness of 3D robustly optimized IMPT in stage III NSCLC [35]. They reported a limited impact of setup and range uncertainties, breathing motion and interplay effects on the dose distribution when using properly selected robust optimization parameters. This is in line with our analysis for the planning scan.
A strength of our study was the prospective study design with repeated imaging at the start of treatment. At this time point, we expected a small probability of anatomical changes in need of ART, based on experience from photon therapy [36]. A CT at fraction 2 or 3 was therefore chosen for robustness evaluation, as it would reveal if any of the optimization techniques were particularly sensitive towards interfractional variations such as changes in breathing pattern or positioning of the patient.
Indeed our results show that none of the optimization techniques for PBS-PT were able to handle substantial changes in the breathing pattern. With current robust optimization methods, it is therefore important to verify dose delivery at the onset of treatment. Adaptive protocols in PT are commonly based on weekly 4DCTs, starting at the end of the first treatment week, but imaging at the onset of treatment could recognize these patients earlier.
Importantly, the observed target under-dosage was mainly located in the mediastinal lymph nodes (and not the primary tumor), which are hard to locate on, e.g., CBCT. A possibility is to use the carina as a surrogate structure in addition to the diaphragm, as done by Møller et al. in their ART protocol [36]. The carina position has been shown to correlate better with lung volume than, e.g., diaphragm position [37]. Alternative strategies to avoid dose degradation due to breathing motion changes could be respiratory gating or breath-hold strategies. Although images in DIBH were acquired in the current study and used clinically in IMRT for three patients, analysis of PBS-PT in DIBH was beyond the scope of the current study.
Both Ribeiro et al. and Hoffmann et al. reported that altered shoulder position caused a loss in robustness when evaluating dose during treatment [27,28]. This was not observed in our study, but in principle, this could also occur from the planning to the start of treatment and should be kept in mind when evaluating robustness at the onset of treatment. Our study design was limited to observing changes between the planning and start of treatment, and hence anatomical changes such as atelectasis or pleural effusion were not observed. Such changes can occur during treatment and largely impact the delivered dose, but they are well known and can be corrected for by existing adaptive protocols [28]. The novelty of our study lies in focusing on uncertainties that so far have received less attention. We have shown that these are neither handled by current robust optimization techniques nor adaptive protocols.
Regarding the comparison of different PBS-PT optimization techniques, Ribeiro et al. compared 3D and 4D robustly optimized IMPT plans with layered rescanning in their study [27]. Similar to us, they found only small differences in robustness between the techniques. However, IMPT was (somewhat surprisingly) slightly more robust than 4DIMPT in their study, while we found the opposite. This might be explained by the difference in the use of density override for the target. In the study by Ribeiro et al., density override was only used for the IMPT plans, while we used it on the AIP for both techniques. This is an example of one out of several technical details that might influence robustness; rescanning is another [14]. Indeed, with the use of rescanning, the uncertainties due to the interplay effect could be limited even further than reported here. Liao and colleagues have pointed out the importance of treatment planning experience in PT for NSCLC [11]. In addition to comprehensive treatment planning guidelines, solutions for automated treatment planning could be useful to ensure the high plan quality needed in PBS-PT for LA-NSCLC [38].
The number of treatment fields could also influence the robustness of the PT plans. In this study, two fields were used for ten patients and three fields for five patients. On the one hand, adding a third field could increase the robustness, as the dose contribution is divided between more treatment angles, and changes in anatomy affecting one of the fields have a lesser impact on the dose distribution. However, some issues came with increasing the number of fields. For some patients, finding a third, robust angle could be difficult due to, e.g., arm position or large breasts or fat folds, where it was preferred to avoid beam entry due to positioning uncertainty. In the 4D optimization, splitting the fields with field-specific targets was not possible, so the fields had to be able to contribute to both the primary tumor and the lymph node volumes, giving some limitations for robust angles because of the surrounding anatomy. Hence, the requirement for the field setup to work for all optimization techniques was a limitation in this study.
Another limitation of the current study is the low number of included patients. Despite this, there was a large variation in tumor size and position, and breathing motion ranged from negligible to substantial. These parameters also varied among the patients that failed the robustness criteria at Start. Finally, the 4D optimization and extreme phase and interplay evaluations performed in this study required deformable image registration and mapping of contours to each phase of the 4DCT. As delineation was performed on the AIP as a part of the clinical routine, the contours were mapped from the AIP to each phase. Due to blurring of the edges, the GTV on the AIP may be slightly larger than in reality, and the plans may therefore be slightly more robust than if delineation had been performed on one of the phase images.

Conclusions
The potential of IMPT and 4DIMPT for reducing heart and lung toxicity in the treatment of LA-NSCLC was substantial and persistent at Start. SFUD only showed potential for reduced heart toxicity. All proton optimization techniques responded similarly to uncertainties and were sufficiently robust towards setup and range uncertainties as well as interplay at Plan, and for the majority of patients in recalculations at Start. Altered breathing patterns between Plan and Start jeopardized target coverage for all PBS-PT techniques. Adaptive protocols for free-breathing PBS-PT should include imaging at onset of or early in treatment, and possibly a surrogate for visualization of the mediastinal target. Given such strategies to recognize patients with altered breathing patterns, we believe there is great potential for PBS-PT to improve the treatment of LA-NSCLC. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy reasons as they are part of an ongoing study.

Acknowledgments:
The authors are thankful to Sara Pilskog for scientific discussions and clinical personnel at Haukeland University Hospital for data collection.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.