Automated Knowledge-Based Intensity-Modulated Proton Planning: An International Multicenter Benchmarking Study

Background: Radiotherapy treatment planning is increasingly automated and knowledge-based planning has been shown to match and sometimes improve upon manual clinical plans, with increased consistency and efficiency. In this study, we benchmarked a novel prototype knowledge-based intensity-modulated proton therapy (IMPT) planning solution, against three international proton centers. Methods: A model library was constructed, comprising 50 head and neck cancer (HNC) manual IMPT plans from a single center. Three external-centers each provided seven manual benchmark IMPT plans. A knowledge-based plan (KBP) using a standard beam arrangement for each patient was compared with the benchmark plan on the basis of planning target volume (PTV) coverage and homogeneity and mean organ-at-risk (OAR) dose. Results: PTV coverage and homogeneity of KBPs and benchmark plans were comparable. KBP mean OAR dose was lower in 32/54, 45/48 and 38/53 OARs from center-A, -B and -C, with 23/32, 38/45 and 23/38 being >2 Gy improvements, respectively. In isolated cases the standard beam arrangement or an OAR not being included in the model or being contoured differently, led to higher individual KBP OAR doses. Generating a KBP typically required <10 min. Conclusions: A knowledge-based IMPT planning solution using a single-center model could efficiently generate plans of comparable quality to manual HNC IMPT plans from centers with differing planning aims. Occasional higher KBP OAR doses highlight the need for beam angle optimization and manual review of KBPs. The solution furthermore demonstrated the potential for robust optimization.


Introduction
The number of proton centers is increasing [1]. However, treatment planning studies for established photon modalities such as intensity-modulated radiation therapy (IMRT) and volumetric-modulated arc therapy (VMAT) indicate substantial intra-and inter-institutional variation [2]. This will presumably also be the case for intensity-modulated proton therapy (IMPT) since IMPT treatment planning is complex and involves many steps, including: determining beam arrangements, using appropriate optimization objectives and deciding whether to use of a range shifter or bolus. While certain centers are using robust optimization, which adds further complexity, the majority of proton therapy treatments, to-date, have been delivered without this [3][4][5][6]; and although online adaptive proton therapy has been proposed to deal with daily anatomical changes, this requires faster planning solutions to be practical.
A desire to reduce variation, and produce consistently "good" plans, has largely spurred the development of automated treatment planning for photon therapy [7][8][9][10]. One such approach is knowledge-based planning, an example of which is RapidPlan TM (Varian Medical Systems, Palo Alto, CA). RapidPlan utilizes the geometries and associated dosimetry of previously created treatment plans to construct a model which can be then used to predict a range of achievable organ at risk (OAR) dose-volume histograms (DVHs) for future patients. Provided the library of plans used to derive the model is of a "high enough" quality, improvements in photon treatment plan quality and reductions in planning time and variability are attainable. Knowledge-based planning has also demonstrated a role in patient-specific plan quality assurance (QA) [11][12][13][14].
We previously illustrated the principle of using the photon-RapidPlan solution to automatically create IMPT treatment plans [15]. Since then, we have collaborated on the development of a prototype knowledge-based IMPT (proton-specific) planning solution, RapidPlanPT, in which the physical characteristics of proton beams have been appropriately modeled. In this novel study, we benchmark automated knowledge-based plans (KBPs) from this prototype solution against manual head and neck cancer (HNC) IMPT plans from 3 international protons centers. By using a model based on plans from a single center we also evaluate the versatility of such an approach.

Results
Summarized results for the 21 patients from all external centers are presented in Figure 1. Generally, target coverage and homogeneity were similar between KBPs and benchmark plans. Aside from the oral cavity and ipsilateral submandibular gland, KBP OAR mean dose was >2 Gy lower, on average, than in benchmark plans, with statistically significant differences in a number of OARs including the contralateral submandibular gland (8.6 Gy) and larynx (8 Gy). Figures 2 and 3 show instances of similarities and differences in dose distributions and DVHs between benchmark and KBPs. Generation of DVH-predictions, subsequent optimization and dose calculation of KBPs required 8.3 min (n = 6 plans). Table 1 shows the dosimetry for the seven benchmark plans and KBPs from each external center.
For center-A, all but one KBP met the aims for PTV B Dmax of ≤110% (patient 1 had a Dmax of 111% delivered to a volume of 0.1 cm 3 , however center-A accepts >110% in 0.03 cm 3 ). While benchmark plans had, on average, lower parotid gland and oral cavity mean dose, KBPs had lower submandibular gland, esophagus and larynx mean dose. A "continue optimization" was performed to improve PTV homogeneity in four cases while in two cases it was performed for non-modeled OARs not meeting the criteria in Table 2 (assigned the maximum accepted dose as an objective). Figure 4 illustrates the similarity between the mean dose values of OARs (included in Table 2) in benchmark plans and KBPs, with the R 2 value, 0.91, and slope, being close to 1. One OAR, for which the KBP resulted in considerably higher dose than the respective benchmark plan, is circled in Figure 4. This was a left ear with a KBP mean dose of 31 Gy (aim <30 Gy). The largest increase in mean dose to a salivary gland, when using KBPs over benchmark plans for center-A, concerned an ipsilateral parotid gland (6.6 Gy). This increase was due to the standard, Y-shaped, beam arrangement used in the KBP compared with a five-field technique, including two oblique dorsal fields, in the benchmark plan.         All center-B KBPs met PTV B planning-aims despite a small, but statistically significant, decrease in PTV B V95% (Table 1). On average, KBP PTV E1 /PTV E2 V95% was 0.7/0.8% lower than in the benchmark plan. KBPs improved all OAR metrics in Table 1, with a ≥8.2 Gy statistically significant reduction in mean dose to the parotid glands. The largest, statistically significant, improvements in sparing, on average, were in the contralateral submandibular gland (16.3 Gy) and larynx (12.6 Gy). A "continue optimization" was performed to improve PTV homogeneity in all cases however it was not required in any case to improve dosimetry for OARs excluded from the model. For instance, in three patients PTV E2 did not meet the planning-aims of V95% ≥99% after one round of optimization. This was resolved by, after optimization, converting any cold spots in PTV E2 into contours and running the "continue optimization", including additional objectives for the cold spots, ensuring planning-aims were met.
All center-C KBPs met PTV coverage-aims (D95% ≥95%), with minimal differences in PTV coverage and homogeneity relative to benchmark plans. Averaged over seven patients, KBPs improved all OAR sparing (Table 1); noticeable improvements occurred in the contralateral submandibular gland (8.1 Gy) and constrictor muscles (10.6 Gy). There was good correspondence of mean dose to the oral cavity between KBPs and benchmark plans (Figure 4). A "continue optimization" was performed to improve PTV homogeneity in all cases, two of which it was also used to improve dosimetry for OARs excluded from the model. However, for these two instances, in contrast to the benchmark plans, the KBP did not meet planning-aims and resulted in inferior sparing: (1) the V 54 Gy for the left brachial plexus of patient six was 1.1 cm 3 (aim ≤0.05 cm 3 ) and (2) the V 52 Gy for the right brachial plexus of patient seven was 0.6 cm 3 (aim ≤0.05 cm 3 ) when using the KBP.

Discussion
This study, aimed at providing proof-of-principle, demonstrated that a knowledge-based planning solution, using a model library comprising plans from a single center with a standard planning technique, could create good quality HNC IMPT plans when benchmarked against plans from three experienced proton centers. The slightly different planning-aims of each center demonstrated the versatility of this automated approach. Creation of KBPs was efficient, requiring <10 min for generating predictions, subsequent optimization and dose calculation. Occasional higher OAR doses in KBPs highlights the need for manual review of KBPs and an understanding of the limitations of such a knowledge-based approach using fixed beam orientations and clinic-specific planning-aims. Knowing how "good" the plans are in a model library remains a limitation with approaches like that used in RapidPlan/RapidPlanPT. This is one reason why plans from multiple experienced proton centers have been used to benchmark the KBPs in this study. The fact that KBPs were at least comparable to or better than the benchmark plans, indicates that model library plans were of a reasonable quality.
For OARs, common to both the model and benchmark plans, KBPs provided comparable or improved sparing over respective benchmark plans. However, certain OARs/PTVs had an inferior dose distribution in KBPs. In some cases, this concerned OARs not included in the model, such as the ear of center-A and brachial plexus of center-C. For such OARs the maximum accepted dose in the planning aim was used as an objective in the KBP. However, because it was not attempted to reduce dose below this, and because in some cases the optimization weighting may have been too low, dose may have been higher than necessary in the KBP. This reflects a limitation when using a clinic-specific model for other centers: not all OARs may be the same. Furthermore, center-A oral cavity delineation was inconsistent, due to inter-physician differences, and the higher oral cavity mean dose (13.5 Gy versus 10.5 Gy) for KBPs of center-A may be attributable to the delineation of the oral cavity in patients 1-4. These oral cavity structures were >200% larger than cases 5-7, resulting in on average 4.4 Gy higher doses in the KBPs compared with benchmark plans. Although, visually, these contours differed from those in the model, the software did not flag them as volumetric outliers. Patients 5-7 had, on average, only 1.1 Gy higher oral cavity dose in KBPs. While these increases in mean dose may not be substantial, users should nonetheless check that OAR delineation/geometry of a prospective patient is similar to that in the model.
Since modeling partly depends on the beam-arrangement of plans in the model library, it was decided that in order to obtain the most accurate predictions for prospective patients, KBPs should have the same beam-arrangement as those in the model-library. Therefore, this study shows the result of straight-forward, standard three-field automated IMPT optimization without beam angle optimization. This could be a limitation of the study and KBP plan quality may be further improved with patient-specific beam-arrangements (as in the benchmark plans). Future investigations should test the applicability of such a model for differing beam arrangements [16] and include differing beam-arrangements in the model library. Optimized beam angle selection in IMPT is under investigation [17,18] and could potentially be integrated into future IMPT planning solutions. Since all external centers used the same treatment planning system, the versatility of our approach, with respect to plans from multiple vendors, was not tested. Furthermore, external centers used a bolus/range shifter, and differing beam data [19,20] and target margins. While examining the effect of such parameters on the results was beyond the scope of this study, previous work has alluded to these issues. Langner et al. compared commissioning beam data for two separate proton treatment centers which obtained their beam data using independent dosimeters and found that spot profiles were very similar both intra-institutionally (between treatment rooms) and inter-institutionally [21]. Minimal outlier removal was performed in the current analysis and while previous work has indicated that outlier removal may not always be necessary for RapidPlan photon KBPs [22,23], this has not yet been tested for RapidPlanPT. Finally, we note that the software used in this study is still under refinement and improvements are possible.
The KBPs in this study were non-robustly optimized, consistent with general clinical practice at the outset of this study [3][4][5][6]. The results are therefore relevant to contemporary practice. Whether RapidPlanPT can also be used for robust optimization requires further investigation. Since center-A began using robust optimization during the course of this study, and interest in robust optimization is increasing, we performed a preliminary test of the potential of using RapidPlanPT with a model based on non-robustly optimized plans to be used for robust optimization. Center-A provided clinical, robustly optimized plans for three new cases. Using the same model (comprising non-robustly optimized treatment plans), we created robustly optimized KBPs for these patients. All three KBPs met center-A's robustness criteria and plan quality was largely comparable to that of the clinical plans (Supplementary Materials: Figures S1 and S2, Tables S1 and S2). These initial results are encouraging and merit investigation with a larger sample size.
We previously showed the feasibility of using a photon-specific knowledge-based solution for IMPT [15]. The present study incorporates proton-specific software which appropriately characterizes proton behavior. Literature on automated proton planning solutions is relatively sparse and we are currently unaware of similar multi-institutional comparisons. However, Hall et al. used OAR-PTV geometry to predict achievable doses for OARs relevant to the treatment of skull-base tumors in order to facilitate comparison of proton and photon plans [24]. Bijman et al. looked at uncertainties in model-based patient selection for IMRT or IMPT, using automatically planned IMPT plans: the approach to automation differed from this study, in that it used a pre-defined wish-list of hard constraints and hierarchical OAR objectives (tackled in order of priority) [25]. Lomax and colleagues have developed a tool to automatically pre-calculate feasible planning solutions specifically for uveal melanomas [26].

Conclusions
This proof-of-principle study has shown that a knowledge-based approach to IMPT planning is feasible. Such a solution could aid both experienced and more recently established proton centers, with a number of potential applications including implementation of consistent, efficient treatment planning; fast (re)planning for adaptive IMPT; and QA of IMPT plans in the clinic and trial setting [27].

Treatment Planning for Populating the Knowledge-Based Planning Model
A constant radiobiological effectiveness (RBE) of 1.1 was assumed for IMPT plans, where all doses reported in "Gy" are understood to represent Gy RBE [28]. The 50 non-nasopharynx, locally advanced HNC IMPT plans in the RapidPlanPT model were coplanar and used a simultaneous integrated boost (SIB) technique to deliver 70/54.25 Gy to the boost/elective planning target volume (PTV B /PTV E ) in 35 fractions. PTV margins were 4-5 mm [29]. A 5 mm transition-region facilitated dose fall-off between PTVs. Plans typically aimed to spare multiple salivary glands, swallowing muscles and the oral cavity [30], although certain OARs could be excluded due to the extent of overlap with PTVs. Maximum point dose objectives were used for spinal cord, brainstem and planning-at-risk volumes (isotropic 3 mm OAR expansion). The aim was to deliver 95% of prescribed dose (V95%) to ≥99%/98% of PTV B /PTV E while limiting PTV volume receiving >107% of prescription dose. IMPT plans were created using the Varian Eclipse non-linear universal proton optimizer (NUPO) and proton convolution superposition algorithm (PCS) v13.7.14 (Varian Medical Systems, Palo Alto, CA, USA) with a 2.5 mm calculation grid. Plans were made with a standard three-field, multi-field optimization (MFO) technique, with gantry angles at 35 • -55 • , 180 • and 305 • -330 • . Gantry variations in these three directions were determined by PTV geometry. A range shifter of 5.7 cm water equivalent thickness was used for proximal PTV irradiation [31]. Each field included proximal, distal and lateral target margins of 0.2 cm, 0.3 cm and 0.5 cm, respectively. A non-robust optimization was performed interactively by manually adjusting optimization objectives to maintain an approximately fixed diagonal distance to DVH-lines in the optimization-window. In certain cases, a subsequent optimization or "continue optimization" was performed to improve PTV dose homogeneity [32]. A basic refinement of the model was carried out in which regression, residual and DVH-plots, as well as statistical metrics provided by the planning software were used to remove obvious outliers with DVHs a considerable distance above the predicted curve [14,22]. The minimal refinement is reflected in the number of structures matched to each OAR in the model (average: 47, minimum: 37).

External Center Treatment Planning
Typical planning-aims for PTVs and OARs are shown in Table 2. Center-A defined dose-volume constraints on "OAR minus PTV" structures whilst center-B and -C used the entire OAR. For analysis, all dosimetric data is reported on the entire OAR structure. Up to two elective PTVs, which could differ in prescription dose per patient, were used. Therefore, we refer to the boost PTV as PTV B , the mid-dose elective PTV as PTV E1 , and the low-dose elective PTV as PTV E2 . This study intends to demonstrate the feasibility of knowledge-based planning for IMPT, regardless of the beam data used. All reporting is done on the PTVs provided by each center. Although differences in beam data and the choice of proximal/distal spot placement margins may have some effect on the dose distribution and amount of OAR sparing, such effects are expected to be small for state of the art clinical IMPT facilities, and the choice of optimization objectives is likely to have more impact on the OAR sparing. For all these reasons the KBP is considered relatively insensitive to the vendor-specific beamline or beam model.

Center-A
IMPT plans utilized a SIB, delivering 70 Gy, 63/59.5 Gy and 56/54.3 Gy to PTV B , PTV E1 and PTV E2 in 35 fractions, respectively. PTV aims were such that dose to 95% of the volume (D95%) was ≥100% of the prescribed dose. IMPT plans were created using Varian Eclipse NUPO and PCS v13.7.15 with a 2.5 mm dose calculation grid. Plans utilized a three or five-field MFO technique, including a beam from the anterior direction to irradiate the neck and shoulder nodal regions and two oblique posterior-anterior beams utilized for the head, however, beam angles varied according to tumor location. A range-shifting bolus of 7.5 cm water equivalent thickness was used in all plans. Each field included proximal, distal and lateral target margins of typically 0.5 cm, 0.5 cm and 0.3-0.5 cm, respectively.

Center-B
IMPT plans utilized a SIB, delivering 70 Gy, 63 Gy and 56 Gy to PTV B , PTV E1 and PTV E2 in 35 fractions, respectively. The aim was that all PTVs had a V95% ≥ 99%. IMPT plans were created using Varian Eclipse NUPO and PCS v13.7.15 with a 2.5 mm dose calculation grid. Plans utilized a three-field MFO with typical gantry angles of 35 • , 180 • and 325 • . A range shifter of 5.7 cm water-equivalent thickness was used for anterior-oblique fields. Typical proximal, distal and lateral target margins were 0.5 cm, 0.5 cm and 1 cm, respectively.

Center-C
IMPT plans utilized a SIB, delivering 70 Gy, 60/58 Gy and 54 Gy to PTV B , PTV E1 and PTV E2 in 35 fractions, respectively. PTV D95% aim was ≥95% of the prescribed dose. IMPT plans were created using Varian Eclipse NUPO and PCS v13.7.15 with a 2.5 mm dose calculation grid. Plans typically utilized a four-field MFO with varying gantry angles to cover both cranial and caudal portions of a patient. Beam-arrangements were cross shaped in cranial portions to avoid the shoulders. A range shifter of 5 cm water equivalent thickness was used for all fields. Each field included proximal, distal and lateral target margins of 0.8 cm, 0.8 cm and 1.7 cm, respectively.

Evaluation Patients
Each external center provided 7 HNC, non-robustly optimized, coplanar IMPT benchmark plans. Center-A provided clinical plans while center-B and -C created plans solely for the purpose of this study. Benchmark plans were deemed clinically acceptable by each of the respective external centers. All centers used their own clinical protocol and beam arrangement. For each patient, and without reference to the benchmark plans, RapidPlanPT was used to create non-robust KBPs using NUPO v15.5.99 (facilitates use of line objectives) and PCS v15.0.17 using a standard Y-shaped beam arrangement and the same target margins as used in the model-plans. Dose values of PTV E1 /PTV E2 optimization objectives were assigned manually, prior to optimization, as the prescription dose varied per center. Optimization objective priorities mirrored those of model library plans. A subsequent optimization or "continue optimization" was performed if (i) PTV homogeneity was not acceptable (priorities increased/objectives added for PTVs) or (ii) KBPs did not meet criteria in Table 2 for OARs not included in the model (objectives added for any non-compliant OAR outside of the modeled structures). External centers could utilize a number of volume constraints for OARs not included in Table 2, such as the brachial plexus. Where necessary, additional manual constraints were used in the KBP planning process to address this.

Study Endpoints
KBPs were normalized such that PTV B D95% ≥ 100%, V95% ≥ 99% and D95% ≥ 95% for center-A, -B and -C, respectively. Since external centers had differing PTV and OAR planning aims (Table 2), comparisons between benchmark plans and KBPs were performed altogether and per-center, on the basis of (1) target coverage and homogeneity (PTV D95%, PTV V95% and PTV homogeneity index (HI), where HI = 100% × (D2% − D98%)/D50%) (2) mean dose to OARs. For combined data and data per center, paired 2-sided Student t-tests and Wilcoxon signed rank tests were used to determine whether differences between benchmark plans and KBPs were significant (p < 0.05), respectively. The time required to create KBPs was also measured.

Ethical Statement
All patients signed a consent form, where they agreed to the use of their data for scientific purposes at PSI. The institutional review board (IRB) of the University of Cincinnati Medical Center determined the research proposal does not meet the regulatory criteria for research involving human subjects and thus ongoing IRB oversight was not required. The retrospective planning study was approved at the University of Pennsylvania under IRB # 830036 (1 June 2018).