1. Introduction
Radiation therapy techniques such as volumetric-modulated arc therapy (VMAT), intensity-modulated radiation therapy (IMRT), and three-dimensional conformal radiation therapy (3D-CRT) have been effectively employed in the treatment of breast cancer following breast-conserving surgery and in node-positive post-mastectomy cases [
1]. The use of VMAT, a technique that can deliver highly conformal dose distributions to the target while limiting the dose to the organs at risk (OARs), for breast cancer radiotherapy—including breast irradiation for patients having breast-conserving surgery and chest wall irradiation for node-positive patients following mastectomy—is becoming standard in most radiotherapy centres [
2,
3,
4].
The application of VMAT in breast cancer is controversial due to the potential risk of secondary cancer caused by low-dose exposure; however, several studies have indicated that comparing VMAT with IMRT and traditional 3D conformal techniques highlights critical trade-offs, including better dose homogeneity and improved OAR sparing [
5,
6,
7,
8,
9]. Currently, various VMAT techniques, such as multi-partial arc VMAT, are employed to produce plans that improve dose homogeneity within the target and assist in achieving dose constraints for OARs even in cases involving the axillary lymph and internal mammary nodes (IMNs).
However, the VMAT technique has raised concerns regarding achieving adequate Planning Target Volume (PTV) coverage with the prescribed dose and optimal superficial dose because of the deformation of the surface of the breast, while addressing setup uncertainties poses a challenge due to inter- and intrafractional motion. This issue has adversely impacted the precision of treatment delivery in comparison to static fields for breast irradiations due to the highly modulated nature of VMAT [
10]. In addition, VMAT involves multiple beam incidences in contrast to conventional tangential beam arrangements, often including more perpendicular beam entry angles relative to the skin surface, which may reduce the delivered skin dose and resulting compromise target coverage [
11,
12]. While the impact of flash thickness on plan quality remains contentious, several studies [
13,
14] have recommended implementing auto flash (AF) in VMAT breast irradiation to address potential setup errors and breast deformation beyond the skin surface. Most treatment planning systems enable users to specify the thickness of the flash region to achieve a clinically acceptable plan by automatically extending fluence and shaping Multileaf Collimator (MLC) leaves to account for the thickness of the flash region during the planning optimisation phases of the treatment planning algorithm, thereby improving skin dose uniformity and enhancing treatment robustness against patient-specific variations in VMAT breast irradiation. Hubley et al. [
15] demonstrated that plan quality reduces as the AF margin increases; however, a recommendation of a 7–10 mm flash region can yield a robust plan.
While many studies have examined the impact of flash thickness on plan quality and robustness, there is a paucity of research connecting flash thickness to complexity metrics, including modulation factor, segment count, and the passing rates of patient-specific quality assurance (PSQA) outcomes that are directly influenced by these metrics in VMAT breast plans. This study aims to identify the optimal flash thickness that helps generate a high-quality treatment plan by ensuring adequate coverage of the PTV and minimising dose discrepancies near the surface, as well as preventing the compromise of PSQA results for breast and chest wall VMAT irradiation with or without regional lymph nodes. Furthermore, to the best of our knowledge, this study is the first to demonstrate the relationship between AF margin thickness and both plan complexity and PSQA outcomes within a partial VMAT breast radiotherapy planning technique using Monte Carlo-based dose calculation with the AF feature.
2. Materials and Methods
2.1. Patient Selection
This retrospective study encompassed a total of 20 patients (11 left-sided and 9 right-sided) who were randomly chosen, including early-stage cases and stages requiring irradiation of regional lymph nodes (RN) (excluding the IMN), treated with either hypofractionation or conventional fractionation.
Table 1 summarises the patient specifications, including treatment volumes and prescribed doses. Eight patients received moderate fractionation, with a prescribed dose of 40.05 Gy delivered in 15 fractions to either the breast or chest wall and regional nodes, while six patients received 40.05 Gy to the breast PTV and 48 Gy to the tumour bed simultaneously. The second prescription was conventional fractionation, delivering 50 Gy in 25 fractions for the breast or chest wall and regional lymph nodes, applied to six out of twenty patients.
2.2. VMAT Planning
Computed tomography (CT) simulation with a 2.5 mm slice thickness (Discovery CT590 RT, GE Healthcare, Chicago, IL, USA) was performed for RT treatment planning to delineate the target volume and adjacent OARs according to European Society for Radiotherapy and Oncology (ESTRO) guidelines [
16]. The radiation volumes comprised the clinical target volume (CTV), which included the whole breast or chest wall (CW), and for advanced-stage patients with supraclavicular and infraclavicular involvement, any portion of the axillary bed at risk was delineated as CTV_RN. The Planning Target Volumes (PTV) for the breast (PTV_Breast) and for regional lymph nodes (PTV_RN) were established by applying an isotropic margin expansion of 3 mm to CTV_Breast and CTV_RN, followed by a reduction of 2 mm from the skin and 5 mm from the ipsilateral lung, respectively. The total PTV for patients with a PTV_RN volume was generated by merging PTV_Breast and PTV_RNs. The clinically accepted plans for the retrospectively selected patients were optimised with an auto skin flash of a 10 mm margin at the time of their treatment, which is our clinical decision to ensure that at least 95% of the PTV receives 95% of the prescribed dose of either 40 or 50 Gy to the breast or chest wall and lymph nodes. The treatment plans for these patients were replanned by modifying the AF thickness to 0, 3, 5, 10, and 15 mm, utilising the same CT dataset, delineated contours, and prescribed treatment doses. The abbreviations AF0, AF3, AF5, AF10, and AF15 in this study represent the AF thicknesses employed during the optimisations of 0, 3, 5, 10, and 15 mm, respectively. Plans were developed for an Elekta Versa HD accelerator (MLC Agility with 80 leaves and 0.5 cm leaf width) utilising a Monaco v5.11 treatment planning system (Elekta AB, Stockholm, Sweden), employing a 4-partial-arc VMAT (4pVMAT) with 6 MV photons as shown in
Figure 1. In this configuration, the isocenter was set inside of the fields’ overlapping area, consisting of 4 arcs that each cover 50°, starting and ending at angles of 300°/20° for the central arcs and 90°/180° for the lateral arcs, thereby allowing a minimum overlap of 15° between the arcs as outlined by Poeta et al. [
17].
All treatment plans were standardised to ensure that 95% of the PTVTotal received at least 95% of the prescribed dose. In cases where the AF plans with varying thicknesses did not achieve this coverage after a second re-optimisation, the plans were normalised to meet the clinical acceptance criterion for PTVTotal. Specifically, the volume receiving more than 110% of the prescribed dose was constrained not to exceed 3 cm
3, even if the maximum dose surpassed the clinical threshold of Dmax ≤ 110%. This approach was preferred to mitigate the elevated modulation factor associated with increasing the optimisation weight of the PTVTotal. All plans were required to satisfy the same institutional OAR dose limits and the V95% ≥ 95% target-coverage criterion prior to comparison; these clinical constraints were therefore held fixed as acceptance conditions across all AF settings, with autoflash thickness as the only variable, rather than treated as per-AF outcome measures.
Table 1 summarizes the patient-specific PTV details, including the prescribed dose for each individual patient and the total PTV volume in cubic centimeters (cc). The critical structures used as OARs were delineated, including the heart, contralateral breast, left and right lungs, esophagus, and spinal cord for all involved patients, with the addition of the left anterior descending artery (LAD) for patients with left-sided breast sites as described by Offersen et al. [
16] and Nielsen et al. [
18].
The number of control points per arc, segment dimensions, and fluence smoothing—parameters that influence complexity metrics such as segment number, modulation factor (MF) and monitor unit (MU) efficiency—remained consistent in the reoptimised plans across all AF thicknesses for all patients. A standard deviation of 1% was used in Monte Carlo dose calculation with a dose grid of 3.0 mm. The replans were named VMAT_AFx, referring to the thickness of the AF (in mm) used in the optimisation. At the beginning of the inverse optimisation for each VMAT_AFx plan, the same optimisation template as VMAT_AF10—the accepted plan at the time of treatment—was used. Dose objectives and dose–volume constraints were defined according to institutional breast VMAT planning practice and established radiotherapy protocols, in line with published guidelines such as ASTRO, DEGRO, and UK practice reports for whole-breast and chest wall irradiation [
19,
20,
21,
22,
23,
24,
25,
26]. These constraints, summarised in
Table 2, include limits for the ipsilateral lung, heart, LAD, and contralateral lung and breast. For the 50 Gy in 25 fractions schedule, conventional fractionation constraints were applied, while for 40 Gy in 15 fractions these were adapted to clinically comparable hypofractionated dose levels (e.g., V17 Gy instead of V20 Gy and V35 Gy instead of V40 Gy). Following optimisation and normalisation for PTV coverage as described above, all plans across the different AF thicknesses were reviewed to ensure compliance with clinical acceptance criteria, with particular emphasis on maintaining heart and contralateral organ doses as low as reasonably achievable (ALARA). When OAR dose constraints were not satisfied after normalisation, plans underwent further iterative re-optimisation, with adjustments to both target volume objectives and OAR dose constraints to achieve compliance with the criteria defined in
Table 2.
2.3. Plan Complexity and Delivery Accuracy Analysis Patient Selection
Plan complexity and delivery accuracy were evaluated to investigate the impact of AF thickness on VMAT plan deliverability and to derive the correlation between changes in plan complexity metrics and delivery accuracy. For each patient, five treatment plans corresponding to different AF thicknesses (AF0, AF3, AF5, AF10, and AF15) were generated and analysed under identical planning conditions, with AF thickness being the only varying parameter. Plan complexity was quantified using parameters extracted from the treatment planning system, including modulation factor (MF), total MU, segment number, MU efficiency, and normalisation ratio. The modulation factor, defined as the ratio of total MU to the prescribed dose, expressed in MU/cGy, was used as a primary indicator of fluence modulation. Since all plans were normalised to achieve comparable PTV coverage, variations in MF reflect changes in modulation complexity associated with different AF thicknesses. The normalisation ratio, a scaling factor applied after optimisation to satisfy the PTV coverage constraint, was also inspected as a parameter of plan quality and dose uniformity. Together, these parameters provide a comprehensive assessment of plan modulation, delivery efficiency, and optimisation performance.
PSQA was performed for all plans using the IBA Matrixx Resolution 2D detector array system. Measured dose distributions were compared with the corresponding calculated dose distributions from the treatment planning system using gamma index analysis. Gamma passing rates (GPR) were evaluated using 3%/3 mm, 3%/2 mm, and 2%/2 mm criteria, corresponding to standard, intermediate, and strict levels of evaluation, respectively. Clinical acceptance thresholds were defined as ≥95% for the 3%/3 mm criterion and ≥90–95% for the 3%/2 mm criterion, while the 2%/2 mm criterion with a 10% dose threshold was used for more sensitive assessment of delivery accuracy.
The relationship between AF thickness, plan complexity metrics, and PSQA results was analysed across all patients to determine whether increasing AF thickness leads to changes in modulation characteristics and delivery performance. Given that each patient was evaluated under all five AF conditions, a paired statistical design was adopted. Data normality was assessed using the Shapiro–Wilk test. To evaluate the overall effect of AF thickness on each metric across the five paired conditions, the non-parametric Friedman test was applied. Where a significant overall effect was identified, post hoc pairwise comparisons between individual AF levels were performed using the Wilcoxon signed-rank test, with Bonferroni correction applied for ten pairwise comparisons (α_corr = 0.005). The Spearman rank correlation was used to assess monotonic associations between AF thickness (treated as a continuous variable, 0–15 mm) and each metric across all 100 plans, as well as inter-metric associations between complexity parameters and GPR. The Pearson product-moment correlation was additionally computed for complexity-vs-PSQA pairs to capture linear relationships. A two-sided p-value < 0.05 was considered statistically significant for individual tests, with Bonferroni-corrected thresholds reported separately for pairwise comparisons. All analyses were performed in Python (version 3.12.3) (SciPy (version1.17.1) library) on the full dataset of n = 20 patients × 5 AF thicknesses = 100 plans.
To evaluate the consistency of the autoflash effect across clinical scenarios, a linear mixed-effects model was fitted to the primary endpoint (gamma passing rate at 3%/3 mm). The model included autoflash thickness (categorical, five levels: AF0, AF3, AF5, AF10, AF15) and clinical subgroup (categorical, three levels: breast-only, breast plus RNI, chest wall plus RNI) as fixed effects, with their interaction term, and patient as a random intercept to account for the repeated-measures structure of the data. Models were fitted by maximum likelihood; significance of the AF main effect, the subgroup main effect and the AF × subgroup interaction was assessed by likelihood-ratio tests against the corresponding nested models. The intraclass correlation coefficient (ICC) was computed from the random-effect variance to quantify the proportion of total variance attributable to between-patient differences.
3. Results
All treatment plans were normalised to ensure that 95% of the PTVTotal received at least 95% of the prescribed dose, with a normalisation ratio maintained close to unity across all AF levels (range: 1.029–1.032). Because every plan was renormalised to the same coverage criterion (PTVTotal V95% = 95%), target coverage—including the superficial component retained by the 2 mm skin-subtracted PTV margin—was held constant across plans by construction; the quantity that therefore reflects the influence of AF thickness on coverage is the normalisation ratio, i.e., the scaling required to satisfy that criterion. Its near-unity, essentially constant value across AF0–AF15 indicates that the optimiser reached full target coverage with equivalent efficiency at every AF thickness, without AF-dependent rescaling. The study population included 20 patients in three different clinical treatment groups: 12 patients treated with whole breast irradiation (with or without tumor bed boost), 5 patients treated with breast and regional lymph node irradiation, and 3 patients treated with chest wall and regional lymph node irradiation. The mean PTVTotal volumes were 985.9 ± 403.1 cc, 1358.5 ± 464.5 cc, and 560.8 ± 125.9 cc for entire breast, breast with regional lymph nodes, and chest wall with regional lymph nodes groups, respectively, reflecting the broad anatomical and volumetric diversity inherent to this patient population. Across the full cohort, the median PTVTotal was 920.5 cc (mean 1015.3 ± 453.1 cc; range: 418.0–1942.8 cc), underscoring the heterogeneous clinical conditions under which AF thickness optimisation was evaluated. This heterogeneity is clinically significant as it facilitates the assessment of the five evaluated AF thicknesses (0, 3, 5, 10, and 15 mm) across a representative range of target geometries and treatment complexities encountered in standard breast and chest wall radiation practice.
3.1. Effect of Automatic Skin Flash Thickness on Plan Complexity Metrics
3.1.1. Modulation Factor (MF)
The mean modulation factor was within a narrow range across the five AF settings (
Table 3), with values of 4.77 ± 1.01 MU/cGy for AF0, 4.64 ± 0.93 MU/cGy for AF3, 4.65 ± 0.79 MU/cGy for AF5, 4.66 ± 0.68 MU/cGy for AF10, and 4.54 ± 0.86 MU/cGy for AF15. Mean MF was highest at AF0 and lowest at AF15, although the overall variation remained small relative to inter-patient differences. As shown in
Figure 2a, MF exhibits substantial inter-patient variability across all AF thickness levels. Despite this variability, the median MF remains stable from AF0 to AF15, with no systematic shift across increasing AF margins.
The Friedman test across the five AF levels did not reach statistical significance (χ2 = 6.67, p = 0.154; n = 20), indicating no overall effect of AF thickness on MF. Pairwise Wilcoxon signed-rank tests showed that only the AF0–AF15 comparison reached the conventional 0.05 threshold (p = 0.040, uncorrected); all remaining nine pairwise comparisons were non-significant (all p ≥ 0.165), and this result did not survive correction for multiple comparisons. Pairwise comparisons against the clinically used AF10 reference were likewise non-significant for MF at every AF setting. Spearman rank correlation between AF thickness and MF was negligible and non-significant (ρ = −0.06, p = 0.57), consistent with the absence of a monotonic trend. Among the intermetric associations across all 100 plans, MF was most strongly correlated with total MU (ρ = +0.67, p < 0.001); this association is partly definitional, as MF is computed from total MU.
3.1.2. Segment Number and Total MU
The number of segments and the total MU showed minimal variation across the five AF settings (
Table 3 and
Figure 2b,c). Mean total MU varied only modestly across the AF range, from 1087 ± 185 at AF0 to 1039 ± 168 at AF15, corresponding to a cohort-mean spread of approximately 48 MU, which lies well within the inter-patient standard deviation (144–212 MU). The mean number of segments was even more stable, ranging from 122.8 ± 38.5 (AF5) to 125.3 ± 38.3 (AF15), a maximal inter-group difference of only 2.5 segments compared with inter-patient standard deviations of 36–39 segments. The box plots in
Figure 2b,c show substantial overlap across AF levels, while the corresponding trends in
Figure 3b,c show no systematic change with increasing AF thickness. The wide error bars in both representations confirm that target volume, anatomical complexity, and the inclusion of regional nodal irradiation drive these metrics far more than the AF margin.
Statistical testing supported the absence of an AF-dependent effect on either metric. The Friedman test across the five AF levels was non-significant for both Total MU (χ
2 = 3.50,
p = 0.478) and segment number (χ
2 = 4.02,
p = 0.403). Spearman rank correlations between AF thickness and the two metrics were correspondingly negligible and non-significant (Total MU: ρ = −0.10,
p = 0.317; segments: ρ = +0.001,
p = 0.989;
Figure 4), confirming the absence of any monotonic trend across the 0–15 mm range. Pairwise Wilcoxon signed-rank tests were non-significant for every AF pair for both metrics (Total MU: all
p ≥ 0.06; segments: all
p ≥ 0.18), including all comparisons against the clinically used AF10 reference. Across all 100 plans, Total MU was strongly correlated with the modulation factor (ρ = +0.67,
p < 0.001), while segment number was inversely correlated with MU efficiency (ρ = −0.45,
p < 0.001) and, to a lesser extent, with the normalisation ratio (ρ = −0.31,
p = 0.002), reflecting the fact that plans with more apertures distribute dose over smaller fractional contributions per segment.
MU Efficiency and Normalisation Ratio: MU efficiency, defined as the ratio of optimised to delivered MU reported by the planning system, showed a small monotonic decrease with increasing AF thickness, from 0.86 ± 0.12 at AF0 to 0.82 ± 0.12 at AF15 (
Table 3). The post-optimisation normalisation ratio applied to satisfy the PTV V95% ≥ 95% coverage criterion was likewise nearly constant, ranging from 1.029 ± 0.018 (AF15) to 1.032 ± 0.023 (AF0), with all five cohort means falling within 0.003 of one another (
Table 3;
Figure 2d and
Figure 3d). A normalisation ratio of 1.030 corresponds to a uniform 3.0% upward rescaling of the optimised dose to bring the PTV V95% to 95%; values close to unity indicate that the optimiser achieved the coverage objective directly, with rescaling acting only as a small correction. All five cohort means fell within the institutional acceptable range (typically ≤ 1.05), with no AF level requiring substantial post hoc rescaling. For both metrics, inter-patient variability dominated any AF-related differences, and the trend lines in
Figure 3d remain essentially flat across the AF range.
The Friedman test across the five AF levels did not reach statistical significance for either metric (MU efficiency: χ
2 = 7.94,
p = 0.094; normalisation ratio: χ
2 = 4.26,
p = 0.373), and Spearman rank correlations with AF thickness were correspondingly negligible (MU efficiency: ρ = −0.11; normalisation ratio: ρ = −0.02; both
p > 0.05;
Figure 4). For the normalisation ratio, no pairwise Wilcoxon comparison reached significance (all
p > 0.30); for MU efficiency, three uncorrected pairwise comparisons against AF15 reached the 0.05 threshold (lowest
p = 0.006 for AF0 vs. AF15) but only one survived Bonferroni correction (α_corr = 0.005), and none of the comparisons against the clinical AF10 reference reached significance for either metric (
Figure 5d,e). Taken together, these results indicate that AF thickness has no clinically relevant effect on either MU efficiency or the normalisation ratio.
Pairwise Wilcoxon signed-rank test
p-values for all five complexity metrics are visualised in
Figure 5. Across the fifty pairwise comparisons performed (ten per metric), only four reached the uncorrected 0.05 threshold (modulation factor: AF0 vs. AF15,
p = 0.040; MU efficiency: AF0 vs. AF15,
p = 0.006, AF3 vs. AF15,
p = 0.028, AF5 vs. AF15,
p = 0.032), and only the AF0–AF15 comparison for MU efficiency would survive Bonferroni correction. None of the comparisons against the clinically used AF10 reference reached significance for any complexity metric, confirming that AF thickness has no clinically meaningful effect on plan complexity over the 0–15 mm range.
3.2. Effect of Automatic Skin Flash Thickness on PSQA
Patient-specific QA was performed for all 100 plans using the IBA Matrixx Resolution (IBA Dosimetry, Schwarzenbruck, Germany) detector array, and GPRs were evaluated under three criteria of increasing strictness: the standard 3%/3 mm criterion (institutional clinical acceptance threshold ≥ 95%), the intermediate 3%/2 mm criterion (≥90%), and the strict 2%/2 mm criterion (used for sensitive assessment of delivery accuracy).
Mean GPR increased with AF thickness up to AF10 and then plateaued at AF15 across all three gamma criteria (
Table 4;
Figure 3e,f and
Figure 6). This behaviour indicates that increasing AF thickness improves delivery robustness up to an optimal threshold (AF10), beyond which further increases do not yield additional dosimetric benefit.
For the 3%/3 mm criterion the mean GPR rose from 92.28 ± 4.83% at AF0 to 93.73 ± 4.55% at AF10, with 93.23 ± 4.70% at AF15. The 3%/2 mm and 2%/2 mm criteria followed the same pattern (3%/2 mm: 88.45 ± 7.72% at AF0, 90.31 ± 6.26% at AF10, 89.27 ± 7.29% at AF15; 2%/2 mm: 78.21 ± 7.69% at AF0, 80.92 ± 8.48% at AF10, 80.48 ± 8.20% at AF15). Per-patient analysis showed that 17 of 20 patients improved their 3%/3 mm GPR between AF0 and AF10, with a mean within-patient gain of +1.46% (median +1.35%, range −1.90% to +3.70%). This consistent improvement across the majority of patients supports the robustness of the observed AF-dependent PSQA trend. The proportion of plans meeting the institutional clinical acceptance criterion of GPR ≥ 95% at 3%/3 mm rose from 5/20 (25%) at AF0 to 9/20 (45%) at AF10 and 8/20 (40%) at AF15. When considering each patient individually, the AF thickness yielding the highest 3%/3 mm GPR was AF10 or AF15 in 14 of 20 patients.
The Friedman test confirmed a significant overall effect of AF thickness on GPR for all three criteria (3%/3 mm: χ
2 = 16.70,
p = 0.002; 3%/2 mm: χ
2 = 10.93,
p = 0.027; 2%/2 mm: χ
2 = 9.61,
p = 0.048;
n = 20 paired observations per metric). Post hoc pairwise Wilcoxon signed-rank tests for the 3%/3 mm criterion identified AF0 vs. AF10 (
p = 0.0003) and AF3 vs. AF10 (
p = 0.005) as significant after Bonferroni correction for ten comparisons (α_corr = 0.005); AF5 vs. AF10 (
p = 0.027) and AF0 vs. AF15 (
p = 0.036) reached the uncorrected 0.05 threshold but did not survive correction. Pairwise comparisons among AF5, AF10 and AF15 were non-significant for all three criteria, confirming the plateau of GPR at the higher AF settings (
Figure 7). Spearman rank correlations between AF thickness and GPR were small and not statistically significant when AF was treated as a continuous variable, reflecting the plateau behaviour between AF10 and AF15, which is not well captured by rank-based monotonic analysis. The relationship between plan complexity and PSQA across all 100 plans is illustrated in
Figure 4. Pearson product-moment correlations between complexity metrics and GPRs were small and consistently negative for total MU (MU vs. 3%/3 mm: r = −0.21,
p = 0.035; MU vs. 3%/2 mm: r = −0.21,
p = 0.038; MU vs. 2%/2 mm: r = −0.20,
p = 0.050) and small and positive for segment number (segments vs. 3%/3 mm: r = +0.20,
p = 0.048; segments vs. 3%/2 mm: r = +0.25,
p = 0.012). Spearman rank correlations for the same pairs were uniformly small and positive (rs = +0.10 to +0.17), and the apparent discrepancy between Pearson and Spearman correlations was driven by a single patient with consistently high MU and low GPR across all AF settings (Patient 8, MU = 1337–1544, GPR 3%/3 mm = 73.8–80.9%); excluding this patient brought the two correlation estimates into agreement, and overall trends across the cohort remained weak. The three gamma criteria were highly intercorrelated (ρ = 0.85–0.92,
p < 0.001 for all pairwise comparisons;
Figure 4), confirming the internal consistency of the QA assessment. No systematic delivery failure was observed at any AF thickness, and all plans remained clinically deliverable.
Taken together, these results provide evidence-based guidance for selecting the optimal AF margin in 4pVMAT breast and chest wall irradiation. From a clinical perspective, AF thickness should balance robustness and delivery accuracy as reflected by PSQA performance. Under the standard 3%/3 mm gamma criterion, improvements in GPR were observed up to AF10, beyond which no further benefit was evident. AF thicknesses of 5 mm and above achieved comparable compliance with the institutional acceptance threshold (≥95%), whereas smaller margins (AF0–AF3) resulted in significantly lower passing rates and may compromise delivery accuracy. A similar pattern was observed under stricter evaluation criteria (3%/2 mm and 2%/2 mm), which are relevant for more sensitive verification scenarios such as re-planning or accreditation. In these cases, a minimum AF thickness of 5 mm remains necessary, with AF10 consistently yielding the highest mean GPR across all criteria. As AF thickness showed no clinically meaningful impact on plan complexity metrics across the 0–15 mm range, margin selection can be guided primarily by delivery accuracy considerations.
Based on these findings, an AF margin of 10 mm is recommended as the optimal setting for 4pVMAT breast and chest wall plans using the Monaco Monte Carlo algorithm. A margin of 5 mm may be considered an acceptable lower bound, while margins below 5 mm should be avoided. Increasing the AF margin beyond 10 mm does not provide additional benefit in either delivery accuracy or plan complexity.
3.3. Consistency of the Autoflash Effect Across Clinical Subgroups
To address the heterogeneity of the cohort, which included breast-only (
n = 9), breast plus regional nodal irradiation (RNI,
n = 8) and chest wall plus RNI (
n = 3) cases, we assessed whether the autoflash effect on PSQA differed across these clinical scenarios using a linear mixed-effects model on the primary endpoint (GPR 3%/3 mm). This analysis complemented the cohort-level Friedman and Wilcoxon tests reported in
Section 3.2, which remain the primary statistical tests of the AF effect.
The model was fitted with patient as a random intercept and autoflash thickness, subgroup and their interaction as fixed effects. The intraclass correlation coefficient (ICC = 0.84) confirmed that between-patient variability dominated the data and justified the random-intercept specification. The direction and magnitude of the cohort-level AF effect were preserved in the mixed model. Although the omnibus AF main effect did not itself reach significance (χ
2 = 7.65, df = 4,
p = 0.105)—diluted by the AF15 plateau and the small AF3 and AF5 increments—the targeted AF10 vs. AF0 contrast remained significant: AF10 plans showed a +1.46% improvement in GPR over AF0 (95% CI +0.36 to +2.55,
p = 0.009;
Table 5), and AF15 plans a smaller non-significant gain (+0.96%,
p = 0.088), consistent with the plateau described in
Section 3.2.
The AF × subgroup interaction was non-significant (χ
2 = 10.90, df = 8,
p = 0.207), indicating that the AF effect did not differ statistically between subgroups; the subgroup main effect was likewise non-significant (
p = 0.127). Per-subgroup AF10 − AF0 improvements were directionally consistent (
Figure 8)—breast-only +1.99% (9/9 patients improved), breast + RNI +1.21% (6/8 improved), CW + RNI +0.50% (2/3 improved)—with the smaller CW + RNI effect attributable to a ceiling at baseline (mean GPR at AF0 already 95.9% ± 2.0%).
These findings indicate that AF10 is the optimal flash margin across the clinical scenarios studied. Formal within-subgroup inference, particularly for CW + RNI (n = 3), remains underpowered; the mixed-effects analysis is therefore reported as the statistically appropriate test of subgroup consistency rather than as confirmatory evidence within each subgroup.
4. Discussion
The central question of this study was whether AF thickness, used purely as an optimisation construct in 4pVMAT breast and chest-wall planning with the Monaco Monte Carlo algorithm, has a measurable effect on plan complexity and on patient-specific delivery accuracy. Our results provide a clear, quantitative answer. Across AF thicknesses from 0 to 15 mm, plan complexity is statistically invariant—Friedman tests on modulation factor (MF), total MU, segment number, MU efficiency, and the post-optimisation normalisation ratio were all non-significant, and none of the pairwise Wilcoxon comparisons against the clinical AF10 reference reached significance for any complexity metric (
Figure 5)—while GPR improves monotonically up to AF10 and plateaus thereafter (
Figure 3 e,f and
Figure 6). To our knowledge, this is the first measurement-based confirmation of this behaviour in the Monaco/Versa HD/Agility MLC combination, and it has direct implications for how the AF margin should be selected in routine 4pVMAT breast and chest-wall planning.
VMAT for breast and chest-wall irradiation is most commonly delivered using one of two beam-design philosophies: a continuous partial arc covering the target from medial to posterior entry, or paired short arcs that mimic conventional tangential fields and form a “butterfly” geometry with avoidance sectors. The 4pVMAT geometry used here as illustrated in
Figure 1, originally described by Poeta et al. [
17] as a four-partial-arc split-VMAT technique with two medial arcs (300°→20°) and two lateral arcs (90°→180°) of 50° length each and a minimum 15° overlap between arcs, sits between these two approaches, combining the conformality of partial-arc planning with the contralateral sparing of tangential-style geometry. Although Poeta et al. originally introduced 4pVMAT to manage breath-hold time during deep inspiration breath-hold (DIBH) treatment without automatic beam-interruption devices, in our institution, the geometry has been adopted independently of DIBH on the basis of plan-quality and delivery-efficiency considerations. The choice of a partial-arc rather than full-arc geometry is consistent with the broader dosimetric concern, raised by Zhang et al. [
7] in their h-IMRT versus VMAT comparison for hypofractionated whole-breast irradiation, that wide full-arc geometries can spread substantial low-dose volumes into contralateral lung and breast that conventional indicators such as V20Gy or mean heart dose may not capture. Importantly, the four-partial-arc geometry constrains the angular range over which AF can affect aperture shape: each arc enters through the breast/chest-wall sector, so AF can only modify the apertures that face the skin surface. This is the geometric reason why, in our data, the optimiser absorbs the AF expansion through aperture-shape changes localised to the build-up region rather than through any global increase in modulation.
Within this beam-design context, the AF-dependent improvement in GPR was not accompanied by any deterioration in plan complexity. Mean total MU varied only modestly across the AF range, from 1087 ± 185 at AF0 to 1039 ± 168 at AF15, a cohort-mean spread well within the inter-patient standard deviation. Mean segment number was even more stable (122.8 ± 38.5 at AF5 to 125.3 ± 38.3 at AF15, a maximal inter-group difference of 2.5 segments against inter-patient standard deviations of 36–39). The post-optimisation normalisation ratio was nearly constant across the AF range (1.029–1.032), confirming that the optimiser achieved the PTV V95% ≥ 95% coverage objective directly at every AF thickness without substantial post hoc rescaling. MU efficiency showed only a small monotonic drift from 0.86 ± 0.12 at AF0 to 0.82 ± 0.12 at AF15, with only the AF0–AF15 comparison surviving Bonferroni correction. Increasing the AF margin from 0 to 10 mm therefore improves delivery accuracy without any clinically meaningful cost in modulation, MU, segment count, MU efficiency, or normalisation behaviour. This decouples the historical concern that larger flash margins might inflate plan complexity, and supports AF10 as the optimal margin on both delivery-accuracy and plan-quality grounds. Hubley et al. [
15] recommended a 7–10 mm flash region as a compromise between robustness and plan quality, but their assessment was based on plan-quality metrics alone, without patient-specific delivery verification. Our findings extend that work in two ways: we confirm directly with PSQA measurements that AF10 yields the highest GPRs across all three evaluation criteria—93.73 ± 4.55% at 3%/3 mm, 90.32 ± 6.26% at 3%/2 mm, and 80.92 ± 8.48% at 2%/2 mm—and we show that this improvement is achieved without any measurable impact on conventional complexity metrics.
A plausible mechanism for the observed AF-dependent improvement in GPR without complexity inflation lies in how inverse optimisers handle the build-up region in superficial targets. As Arbor et al. [
27] demonstrated using a GATE/Geant4 Monte Carlo toolkit for VMAT breast surface dosimetry, target volumes close to the patient surface can drive the optimiser to generate very high fluences in the build-up region in order to satisfy coverage constraints—precisely the region where treatment-planning-system algorithms have known limitations and where measurement–calculation agreement is most uncertain. Extending the optimisation envelope into a virtual flash region relieves the optimiser of this constraint: the fluence used to cover the superficial part of the target is shaped beyond the skin, and the steep fluence gradients that would otherwise sit on the build-up region are pushed outside the patient. The aperture sequence required to achieve this is not appreciably more modulated, which is exactly what we observed—MF, total MU, segment count, MU efficiency, and the normalisation ratio are all statistically indistinguishable across AF0–AF15. AF therefore re-shapes apertures locally at the breast surface without forcing the optimiser to introduce additional modulation elsewhere in the arcs. The same mechanism explains the GPR plateau beyond AF10: once the AF margin exceeds the geometric extent over which the optimiser was attempting to push fluence into the build-up region, further extension of the virtual envelope no longer changes the aperture-shape decisions that drive PSQA performance. The optimiser is no longer fluence-limited at the surface, and additional virtual margin only modifies fluence in regions that are anyway removed before final dose calculation. This interpretation is quantitatively consistent with Lizondo et al. [
11], who showed that beyond a virtual bolus thickness equal to the CTV–PTV margin plus 5 mm, additional thickness produces diminishing returns in robustness and increasing penalties in normalisation behaviour. For the effective skin-direction margin used here (3 mm CTV–PTV expansion reduced 2 mm from skin), this corresponds to a virtual flash of approximately 10 mm, in close agreement with our PSQA-determined optimum. The mechanism is further reinforced by Oh et al. [
28], who analysed 285 breast VMAT plans on the same combination of treatment platform (Monaco TPS, Versa HD linac, Agility 5 mm MLC) used in our study and found that most conventional complexity metrics correlate only weakly with GPR; only aperture-shape descriptors such as leaf sequence variability, plan-averaged beam area, and edge-area metrics emerged as moderate predictors. In that context, our finding that AF improves PSQA without modifying conventional complexity metrics is consistent with the broader behaviour of this TPS/MLC combination: PSQA performance is determined more by aperture geometry than by aggregate modulation. The methodological reliability of our PSQA setup is also supported by Yasser et al. [
29], who reported GPR above 95% at 3%/3 mm for mono-isocentric breast VMAT plans verified with the same IBA Matrixx Resolution detector, the same Monaco TPS, and the same Agility MLC used in the present study.
Our PSQA-based identification of AF10 as the optimum converges with multiple independent dosimetric studies in different planning environments. Nicolini et al. [
12] originally formalised the pseudo-skin-flash strategy by extending the optimisation envelope into a 10 mm soft-tissue-equivalent virtual region around the breast and removing it before final dose calculation; Lizondo et al. [
11] refined the approach by varying both bolus thickness and Hounsfield Unit (HU) assignment, recommending a CTV–PTV margin plus 5 mm thickness with HU values around −500 to −400 for robustness against shifts of up to 10 mm in the breathing direction. Tyran et al. [
30] confirmed dosimetric safety by recalculating plans on second CT scans acquired during treatment, showing that V95% of the breast CTV remained 98.9% with virtual bolus versus 92.6% without, and that the virtual bolus did not degrade the initial plan. Ugurlu et al. [
31] further validated the approach with thermoluminescent dosimetry on an anthropomorphic phantom, observing that 0.5 cm virtual boluses preserved plan quality while progressively thicker boluses elevated PTV maximum dose without compensating dosimetric benefit. Hubley et al. [
15] reached a strikingly similar conclusion using a different planning architecture (static-angle ports within a VMAT arc, RAD platform) and a different HU value (−350), recommending 7–10 mm of automatic skin flash. Our PSQA-based finding that AF10 maximises GPR while leaving complexity unchanged provides independent measurement-based confirmation in a third planning environment—Monaco with 4pVMAT geometry—and connects the optimisation-side observations of these prior studies to the delivery-side reality of patient-specific QA. An important methodological clarification arises from He et al. [
32], who reported that VMAT with 10 mm physical bolus is less robust to 3 mm setup errors than VMAT with 5 mm physical bolus and recommended a 5 mm bolus for postmastectomy VMAT. That conclusion is specific to physical bolus, which remains in the beam path during dose calculation and during delivery. AF is a virtual structure used only during optimisation and removed before final dose calculation; the physical attenuation and obliquity effects that drive the He et al. result therefore do not apply to AF, which can be increased to 10 mm without the robustness penalty associated with a thicker physical bolus. Our results also complement broader breast-VMAT robustness analyses showing that no-flash VMAT plans lose target coverage even under modest CBCT-based anatomical changes [
33], and that robustly optimised VMAT outperforms manual-flash VMAT both in robustness against simulated organ motion and in planner-time efficiency [
34]. The AF mechanism evaluated here can be regarded as an automated counterpart to these strategies, providing the geometric robustness of a flash margin without the planner-dependent variability of manual MLC editing—and our data show that this automation is achieved without inflating plan complexity. Our work also aligns with the Dutch national consensus on breast radiotherapy plan evaluation reported by Hurkmans et al. [
35], which defines target evaluation criteria of D98% ≥ 95%, D2% ≤ 107%, and Dmean within 99–101% of the prescription, all of which were maintained across AF0–AF15 in the present study. The Dutch consensus achieves robustness against the build-up region by clipping CTV and PTV 5 mm below the skin and reporting on the clipped volume; the AF approach used here addresses the same problem from the opposite direction, by extending the optimisation envelope outwards rather than restricting evaluation inwards. Both strategies converge on the same operational principle: the build-up region should not be allowed to drive the optimiser.
The combination of unchanged plan complexity and improved GPR up to AF10 supports adopting 10 mm as the standard AF margin for routine 4pVMAT breast and chest-wall planning in Monaco. AF ≥ 5 mm is the practical lower bound consistent with acceptable PSQA performance; margins below 5 mm—including AF0 and AF3—show measurably degraded GPRs, particularly under the more stringent 3%/2 mm and 2%/2 mm criteria, with 17 of 20 patients improving their 3%/3 mm GPR between AF0 and AF10 and the proportion of plans meeting the institutional acceptance criterion of GPR ≥ 95% at 3%/3 mm rising from 25% at AF0 to 45% at AF10. Beyond AF10, additional flash thickness up to 15 mm did not yield further improvement in PSQA and produced a borderline downward drift in MU efficiency that did not survive correction for multiple comparisons; AF15 should therefore be regarded as the upper bound rather than as a routine setting. These recommendations are relevant to both the 50 Gy/25 fraction and 40 Gy/15 fraction schedules included in our cohort, the latter of which has become the international standard for early-stage breast cancer following the FAST-Forward 5-year analysis confirming non-inferiority of moderate hypofractionation in efficacy and toxicity [
36]. Plan-complexity stability is particularly desirable for hypofractionated regimens in which each fraction delivers a larger biologically effective dose, because any per-fraction PSQA failure carries proportionally greater dosimetric consequence. It should be emphasised that the gamma passing rate quantifies delivery accuracy, namely the agreement between calculated and delivered dose, and is not in itself a measure of geometric or dosimetric robustness against setup error and motion, nor of clinical outcome; the AF-thickness optimisation reported here should therefore be interpreted within the delivery-accuracy domain. The absolute GPR values observed here (mean 92–94% at 3%/3 mm) are characteristic of breast and chest-wall VMAT rather than indicative of delivery problems. Patient-specific QA for these sites is intrinsically demanding: the target abuts the skin and build-up region, the anatomy is large, curved and adjacent to low-density lung, and a 2D detector array has limited angular and spatial sampling for tangential, obliquely incident arcs. Interpreted against the AAPM TG-218 [
37] framework—a universal tolerance limit of 95% (prompting investigation) and an action limit of 90% (below which a plan is clinically unacceptable)—18 of 20 AF10 plans (90%) exceeded the 90% action limit, with only two below it, and no plan showed systematic delivery failure. The clinical relevance of these results therefore lies not in the absolute passing rate, which is detector- and site-dependent, but in the internally controlled, relative effect of AF thickness: increasing AF to 10 mm raised the mean GPR and shifted the distribution upward (plans meeting ≥95% rising from 25% to 45%, and plans meeting the ≥90% action limit from 75% to 90%), moving more plans toward and above the acceptance threshold without any complexity penalty.
This study has several limitations. The results are specific to one TPS algorithm (Monaco v5.11 X-ray Voxel Monte Carlo) and one MLC model (Elekta Versa HD with 5 mm Agility leaves), and Oh et al. [
28] showed that complexity–GPR relationships are sensitive to the specific TPS/MLC combination. Multicentric validation across different dose calculation algorithms (e.g., collapsed-cone convolution, Acuros XB, analytical anisotropic algorithm) and different AF calculation methods on various TPS platforms would strengthen the generalizability of the AF10 recommendation. Because the optimal AF thickness depends on the interplay between the dose-calculation algorithm and the MLC delivery characteristics, the specific value identified here (AF10) should be re-established on other platforms, even though the qualitative pattern of improved delivery accuracy up to a plateau without increased plan complexity is expected to be platform-transferable. We did not perform in vivo skin-dose measurements; the present analysis quantifies the impact of AF on plan complexity and PSQA but does not directly measure skin dose at the patient surface, which remains challenging to compute accurately because conventional TPS algorithms have known limitations in the build-up region [
27]. The present study likewise did not directly evaluate plan robustness against setup error or respiratory motion, which we identify as future work. We also excluded IMN coverage from the cohort; AF margin behaviour for IMN-inclusive plans, where PTV_IMN often includes lung and where the Dutch consensus permits a relaxed D98% ≥ 90% criterion [
35], remains an open question. PSQA was performed with a 2D detector array, which cannot fully capture three-dimensional dose discrepancies. Future work should extend this analysis across multiple TPS/algorithm combinations with different AF implementations, include plans with IMN coverage, incorporate in vivo skin-dose verification, and explicitly evaluate plan robustness against simulated setup errors using the methodology described by He et al. [
32] and Chan et al. [
34]. Finally, the subgroup sample sizes—particularly chest wall plus RNI (
n = 3)—were too small to achieve the power needed for formal within-subgroup inference, so the
Section 3.3 mixed-effects analysis was designed to test the consistency of the autoflash effect across subgroups via the AF × subgroup interaction term rather than through three separate underpowered tests. The non-significant interaction therefore indicates an absence of evidence for subgroup heterogeneity rather than proof of a uniform effect, and confirmation will require larger, multi-institutional cohorts.