Impact of Alternate b-Value Combinations and Metrics on the Predictive Performance and Repeatability of Diffusion-Weighted MRI in Breast Cancer Treatment: Results from the ECOG-ACRIN A6698 Trial

In diffusion-weighted MRI (DW-MRI), choice of b-value influences apparent diffusion coefficient (ADC) values by probing different aspects of the tissue microenvironment. As a secondary analysis of the multicenter ECOG-ACRIN A6698 trial, the purpose of this study was to investigate the impact of alternate b-value combinations on the performance and repeatability of tumor ADC as a predictive marker of breast cancer treatment response. The final analysis included 210 women who underwent standardized 4-b-value DW-MRI (b = 0/100/600/800 s/mm2) at multiple timepoints during neoadjuvant chemotherapy treatment and a subset (n = 71) who underwent test–retest scans. Centralized tumor ADC and perfusion fraction (fp) measures were performed using variable b-value combinations. Prediction of pathologic complete response (pCR) based on the mid-treatment/12-week percent change in each metric was estimated by area under the receiver operating characteristic curve (AUC). Repeatability was estimated by within-subject coefficient of variation (wCV). Results show that two-b-value ADC calculations provided non-inferior predictive value to four-b-value ADC calculations overall (AUCs = 0.60–0.61 versus AUC = 0.60) and for HR+/HER2− cancers where ADC was most predictive (AUCs = 0.75–0.78 versus AUC = 0.76), p < 0.05. Using two b-values (0/600 or 0/800 s/mm2) did not reduce ADC repeatability over the four-b-value calculation (wCVs = 4.9–5.2% versus 5.4%). The alternate metrics ADCfast (b ≤ 100 s/mm2), ADCslow (b ≥ 100 s/mm2), and fp did not improve predictive performance (AUCs = 0.54–0.60, p = 0.08–0.81), and ADCfast and fp demonstrated the lowest repeatability (wCVs = 6.71% and 12.4%, respectively). In conclusion, breast tumor ADC calculated using a simple two-b-value approach can provide comparable predictive value and repeatability to full four-b-value measurements as a marker of treatment response.


Introduction
As oncologic approaches move increasingly towards personalization of therapies, improved methods for early assessment of breast cancer response to neoadjuvant chemother-Tomography 2022, 8 702 apy (NAC) are needed to enable timely modification of therapeutic regimens. Clinical breast examination and routine breast imaging with mammography and ultrasound remain the standard-of-care methods for monitoring patients undergoing NAC; however, because they primarily reflect changes in gross tumor size or morphology, their sensitivity to detect early cytotoxic effects is limited. However, functional imaging technologies can allow for a more specific evaluation of vascular, metabolic, biochemical, and molecular changes in breast tumors in response to treatment. These include magnetic resonance imaging (MRI), fluorodeoxyglucose (FDG) positron emission tomography (PET), molecular breast imaging, and ultrasound and optical imaging techniques as recently reviewed by Rauch et al. Quantification of alterations in water diffusion through diffusion-weighted MRI (DW-MRI) holds strong potential to detect early treatment-induced changes in tumor microstructure, cellularity, and cell membrane integrity [1]. Indeed, numerous breast DW-MRI studies have demonstrated the apparent diffusion coefficient (ADC) metric to be useful in discriminating responders and non-responders in breast cancer treatment [2][3][4][5][6][7][8], and its utility for predicting pathological complete response (pCR) to NAC was recently summarized in a meta-analysis of 15 studies with 1181 total breast cancer patients [9].
Approaches for breast DW-MRI vary widely across prior studies, both in terms of acquisition and interpretation, which has been emphasized as a limitation to translation of ADC as a clinical biomarker. Consensus recommendations from the European Society of Breast Radiology were recently published to facilitate standardization and to provide best practices for achieving an adequate signal-to-noise ratio while minimizing artifacts and distortions [10]. These recommendations are largely based on expert opinions; more data are needed to refine optimal methods for implementation of ADC as a quantitative imaging biomarker in breast cancer clinical trials. The choice of b-values (number and range) is known to influence calculated breast lesion ADC values and affect scan times, but the impact on the reliability and performance of ADC as an imaging marker is not well understood. The Quantitative Imaging Biomarkers Alliance (QIBA) of the Radiological Society of North America put forth a DWI Profile with acquisition and analysis specifications to support use of ADC as a robust quantitative biomarker [11], with a quantitative claim indicating that the 95% confidence interval of a 13% or larger measured change in the apparent diffusion coefficient (∆ADC) is a true change. The Profile's imaging protocols derive from two test-retest studies [12,13] and indicate the number of different b-values to achieve the claim: ideally four, with a target of three b-values. Test-retest data indicating similar repeatability with the use of fewer b-values would allow for shorter acquisitions.
The ECOG-ACRIN Cancer Research Group A6698 multicenter trial, performed as a substudy to the ongoing Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis 2 (I-SPY 2) trial, was designed to validate the performance of breast tumor ADC measures for predicting pathologic response to NAC using a generalizable standardized approach and a four-b-value acquisition. Results from the trial's primary analysis confirmed mid-treatment percent change in tumor ADC (after 12 weeks of chemotherapy) to be significantly predictive of pCR (overall AUC = 0.60, 95% CI 0.52-0.68), with increased accuracy obtained by accounting for breast cancer subtype (AUC = 0.72, 95% CI 0.61-0.83) [14]. A second aim of the trial assessed variability between repeated scans using a test-retest 'coffee break' design, which showed that excellent repeatability and reproducibility of breast tumor ADC measures could be achieved in a multi-institution setting using a standardized protocol and QA procedure [12]. As a secondary analysis of the same study cohort, the purpose of this study was to investigate the impact of alternate b-value combinations for calculating ADC on both predictive performance and measurement repeatability.

Study Participants
In this prospective, Health Insurance Portability and Accountability Act-compliant, multi-institution study, consecutive subjects enrolled in I-SPY 2 at sites meeting DW-MRI qualification requirements were also co-enrolled in the ACRIN 6698 imaging trial (Clin-icalTrials.gov: NCT01564368 [15,16]). Eligibility for I-SPY 2 included women ≥ 18 years of age with invasive breast tumors ≥2.5 cm in size by clinical exam or imaging, and intent to undergo neoadjuvant chemotherapy. Subjects with evidence of distant metastasis were excluded, and those determined to have low-risk disease (i.e., hormone receptor (HR)+/HER2-negative/MammaPrint low) did not proceed to the treatment arm of I-SPY 2. Both the I-SPY 2 and ACRIN 6698 protocols were approved by institutional review boards at all participating sites (listed in the Supplemental Material), and all subjects gave written informed consent using a single combined consent form. ACRIN 6698 was powered to achieve 160 evaluable subjects to adequately test whether changes in tumor ADC during treatment were predictive of pCR; thus, target enrollment was set for 404 participants to account for dropout and expected I-SPY 2 screen fails (more detail provided in the Supplemental Material and [16]). A Consolidated Standards of Reporting Trials flow diagram describing the inclusion and exclusion of trial subjects is provided in Figure 1.

Study Participants
In this prospective, Health Insurance Portability and Accountability Act-compliant, multi-institution study, consecutive subjects enrolled in I-SPY 2 at sites meeting DW-MRI qualification requirements were also co-enrolled in the ACRIN 6698 imaging trial (Clini-calTrials.gov: NCT01564368 [15,16]). Eligibility for I-SPY 2 included women ≥18 years of age with invasive breast tumors ≥2.5 cm in size by clinical exam or imaging, and intent to undergo neoadjuvant chemotherapy. Subjects with evidence of distant metastasis were excluded, and those determined to have low-risk disease (i.e., hormone receptor (HR)+/HER2-negative/MammaPrint low) did not proceed to the treatment arm of I-SPY 2. Both the I-SPY 2 and ACRIN 6698 protocols were approved by institutional review boards at all participating sites (listed in the Supplemental Material), and all subjects gave written informed consent using a single combined consent form. ACRIN 6698 was powered to achieve 160 evaluable subjects to adequately test whether changes in tumor ADC during treatment were predictive of pCR; thus, target enrollment was set for 404 participants to account for dropout and expected I-SPY 2 screen fails (more detail provided in the Supplemental Material and [16]). A Consolidated Standards of Reporting Trials flow diagram describing the inclusion and exclusion of trial subjects is provided in Figure 1. Per the I-SPY 2 protocol, MRI examinations with DW-MRI were performed at pretreatment, early treatment (after 3 weekly doses of paclitaxel/taxane-based therapy), midtreatment (12 weeks, between taxane and anthracycline regimens), and post-treatment (after all chemotherapy) timepoints, prior to surgery ( Figure 2). For the ACRIN 6698 Trial, an advanced 4-b-value DW-MRI sequence was added at each of the MRI timepoints, acquired prior to contrast injection. Test/retest repeatability scans were performed on a subset of patients at pre-or early treatment MRI exams. Individual sites were limited to 14 test/retest patients to better balance the accrual across different MRI scanner manufacturers and field strengths. Figure 1. Consolidated Standards of Reporting Trials diagram summarizing ACRIN 6698 study patient enrollments. (Timepoints: T0, pre-treatment; T1, early treatment; T2, mid-treatment; and T3, post-treatment.) Subjects evaluated in this secondary analysis are indicated by bold text/red boxes.
Per the I-SPY 2 protocol, MRI examinations with DW-MRI were performed at pretreatment, early treatment (after 3 weekly doses of paclitaxel/taxane-based therapy), midtreatment (12 weeks, between taxane and anthracycline regimens), and post-treatment (after all chemotherapy) timepoints, prior to surgery ( Figure 2). For the ACRIN 6698 Trial, an advanced 4-b-value DW-MRI sequence was added at each of the MRI timepoints, acquired prior to contrast injection. Test/retest repeatability scans were performed on a subset of patients at pre-or early treatment MRI exams. Individual sites were limited to 14 test/retest patients to better balance the accrual across different MRI scanner manufacturers and field strengths. The study sample was previously published in the primary analysis evaluating the predictive value of changes in tumor ADC [14], and a subset of 71 subjects were described in two publications evaluating the repeatability of tumor ADC [12] and histogram measures [17]. This secondary analysis assesses the relative performance of various alternative ADC metrics. Data generated or analyzed during the study are available through ECOG-ACRIN, and all images and associated study metadata will be available on The Cancer Imaging Archive (TCIA) website [18], planned for public release in Spring 2022.

MRI Acquisition
The A6698 imaging protocol and the DW-MRI quality assurance process have been previously reported [12,14]. Briefly, all MRI examinations included standardized T2weighted, DW-MRI, and dynamic contrast-enhanced (DCE)-MRI sequences (acquisition parameters are given in Supplemental Table S1). DW-MRI was acquired prior to DCE-MRI in the axial orientation with diffusion gradients in three orthogonal directions using multiple b-values (0, 100, 600, and 800 s/mm 2 ), with a single-shot, diffusion-weighted, spin-echo echo-planar imaging sequence with parallel imaging (reduction factor ≥ 2) and fat suppression. Required scan parameters were TR ≥ 4000 ms, TE minimum (50-100 ms), flip angle 90°, field of view 300-360 mm, acquired matrix 128 × 128 to 192 × 192, and scan time ≤ 5 min. The acquired resolution was 1.7-2.8 mm in-plane with a 4-5 mm slice thickness. No respiratory triggering or other motion compensation methods were used. Test and retest DW-MRI scans for a given patient were performed in the same imaging The study sample was previously published in the primary analysis evaluating the predictive value of changes in tumor ADC [14], and a subset of 71 subjects were described in two publications evaluating the repeatability of tumor ADC [12] and histogram measures [17]. This secondary analysis assesses the relative performance of various alternative ADC metrics. Data generated or analyzed during the study are available through ECOG-ACRIN, and all images and associated study metadata will be available on The Cancer Imaging Archive (TCIA) website [18], planned for public release in Spring 2022.

MRI Acquisition
The A6698 imaging protocol and the DW-MRI quality assurance process have been previously reported [12,14]. Briefly, all MRI examinations included standardized T2-weighted, DW-MRI, and dynamic contrast-enhanced (DCE)-MRI sequences (acquisition parameters are given in Supplemental Table S1). DW-MRI was acquired prior to DCE-MRI in the axial orientation with diffusion gradients in three orthogonal directions using multiple b-values (0, 100, 600, and 800 s/mm 2 ), with a single-shot, diffusion-weighted, spin-echo echo-planar imaging sequence with parallel imaging (reduction factor ≥ 2) and fat suppression. Required scan parameters were TR ≥ 4000 ms, TE minimum (50-100 ms), flip angle 90 • , field of view 300-360 mm, acquired matrix 128 × 128 to 192 × 192, and scan time ≤ 5 min. The acquired resolution was 1.7-2.8 mm in-plane with a 4-5 mm slice thickness. No respiratory triggering or other motion compensation methods were used. Test and retest DW-MRI scans for a given patient were performed in the same imaging examination, at either the pre-treatment (preferred) or early treatment timepoint. The patient was positioned normally (prone) and scanned with initial localization, T2W, and DW-MRI acquisitions. They were then removed from the scanner and taken off the scanner bed, then repositioned as before. The full ACRIN 6698 protocol was then performed, consisting of localization, T2W, DW-MRI, and DCE acquisitions.
Prior to study participation, all sites were required to pass quality control testing consisting of DW-MRI phantom scanning and submission of two in vivo DW-MRI cases acquired using the multi-b-value protocol (previously described in the Appendix of [14]). In vivo images were reviewed for absence of substantial artifacts, homogenous fat suppression, and adequate signal-to-noise ratio.

ADC Measurements
Centralized image analysis was performed by trained researchers at the University of California, San Francisco, blinded to pathologic outcomes (final review performed by J.E.G. with over 10 years of quantitative breast MR analysis experience), using custom software tools developed with IDL (Exelis Visual Information Solutions, Boulder, Colorado) as previously described [14]. Evaluability on DW-MRI was first determined based on an acceptable signal-to-noise ratio, an acceptable degree of fat suppression, an absence of detrimental artifacts and distortions, and partial volume averaging. ADC maps were calculated using the classic monoexponential decay model [19] with linear least squares fitting of the log of the signal vs. b-value using all b-values (0/100/600/800 s/mm 2 ), as in the primary analysis [14], and additionally for a variety of alternate b-value combinations including ADC fast (using b = 0/100 s/mm 2 ) emphasizing microcirculation/perfusion, ADC slow (using b = 100/600/800 s/mm 2 ) minimizing perfusion influence [20], and twob-value estimates (0/600, 0/800, 100/600, and 100/800 s/mm 2 ) to reduce scan times and increase efficiency.
Perfusion fraction, f p , defined as the fraction of the total signal at b = 0 s/mm 2 not accounted for by ADC slow , was also estimated from the 4-b-value data as: where S(0) is the measured signal at b = 0 s/mm 2 and S 0slow is the b = 0 intercept of the mono-exponential fit for ADC slow [21]. This is a simplified analysis that follows the original approach of Le Bihan et al. [22] and assumes that perfusion effects are negligible at b = 100 s/mm 2 . The acquisition used for this study did not sample enough b-values for use with advanced fitting strategies to robustly separate diffusion from perfusion effects [23,24]. Although beyond the primary scope of the A6698 trial, formal IVIM modeling can give unique insights by accounting for the microvascular contribution to the DWI signal. By more densely sampling at very low b-values to accurately measure the signal decay related to microcirculation followed by a biexponential fit of the data, IVIM analysis enables separate characterization of the vascular and tissue components of the diffusion signal, including the perfusion fraction (f ), the pseudo-diffusion rate, reflecting capillary blood flow (D*), and true tissue diffusion (D t ) [22].
Tumor was identified on post-contrast DCE subtraction images and then localized on DW-MRI. Multi-slice, whole-tumor regions-of-interest (ROIs) were manually defined by selecting regions with hyperintensity on high b-value DW-MRI (b = 600 or 800 s/mm 2 ) and a relatively low ADC while avoiding adjacent adipose and fibroglandular tissue, biopsy clip artifacts, and regions of high T2 signal (e.g., seroma and necrosis). For large and multicentric/multifocal disease, all disease regions were included in the ROI and several distinct contours could be drawn on multiple slices to cover the entire tumor region as depicted in the DCE images, without including intervening stroma. All voxels from separate contours were then combined into a single composite ROI to represent the entire tumor and the mean ADC was calculated. Tumor ROIs were redefined for each treatment timepoint, referencing the lesion location on prior exams. In tumors without residual enhancement on DCE after treatment, ROIs were defined in the same tissue region as the prior examination. ROIs were the same ones used in the primary analysis (not redrawn for this study). ROIs were then propagated to the various ADC and f p maps. An example of serial ADC quantitation in a study patient is shown in Figure 3. enhancement on DCE after treatment, ROIs were defined in the same tissue region as the prior examination. ROIs were the same ones used in the primary analysis (not redrawn for this study). ROIs were then propagated to the various ADC and fp maps. An example of serial ADC quantitation in a study patient is shown in Figure 3.

Reference Standard for Pathologic Response
Histopathologic analysis was performed at study sites by institutional pathologists (blinded to MRI measures) according to the I-SPY 2 TRIAL protocol using the Residual Cancer Burden system [25,26]. Following U.S. Food and Drug Administration rationale and guidelines [27], pathologic complete response (pCR) was the reference standard for determining response to neoadjuvant chemotherapy in our study, defined and reported as no residual invasive disease in either breast or axillary lymph nodes after neoadjuvant therapy (ypT0/is, ypN0). Subjects were categorized as pCR or non-pCR based on postsurgical histopathology.

Statistical Analysis
Pearson's correlation coefficients (r) were used to estimate correlations between pretreatment metrics measured using different b-value combinations. Mid-treatment/12week percent change from pre-treatment values was calculated for each metric, and performance for predicting pCR was evaluated by receiver operating characteristic (ROC) curves and associated areas under the curve (AUC). A Delong's non-inferiority test, using a pre-specified non-inferiority margin of 0.02, was used to compare AUCs of the ADC estimates using 2-b-value combinations of 0/600, 0/800, 100/600, and 100/800 s/mm 2 to the reference ADC metric using all four b-values. A non-inferiority test was used because in the case of similar prediction accuracy an ADC metric using a 2-b-value combination might be preferred over the 4-b-value combination due to reduced imaging time and simplified breast DW-MRI acquisition.
A Delong's test of superiority was used to compare the AUCs of the alternate metrics of ADCfast and ADCslow (calculated with b-values of 0/100 and 100/600/800 s/mm 2 ,

Reference Standard for Pathologic Response
Histopathologic analysis was performed at study sites by institutional pathologists (blinded to MRI measures) according to the I-SPY 2 TRIAL protocol using the Residual Cancer Burden system [25,26]. Following U.S. Food and Drug Administration rationale and guidelines [27], pathologic complete response (pCR) was the reference standard for determining response to neoadjuvant chemotherapy in our study, defined and reported as no residual invasive disease in either breast or axillary lymph nodes after neoadjuvant therapy (ypT0/is, ypN0). Subjects were categorized as pCR or non-pCR based on postsurgical histopathology.

Statistical Analysis
Pearson's correlation coefficients (r) were used to estimate correlations between pretreatment metrics measured using different b-value combinations. Mid-treatment/12-week percent change from pre-treatment values was calculated for each metric, and performance for predicting pCR was evaluated by receiver operating characteristic (ROC) curves and associated areas under the curve (AUC). A Delong's non-inferiority test, using a pre-specified non-inferiority margin of 0.02, was used to compare AUCs of the ADC estimates using 2-b-value combinations of 0/600, 0/800, 100/600, and 100/800 s/mm 2 to the reference ADC metric using all four b-values. A non-inferiority test was used because in the case of similar prediction accuracy an ADC metric using a 2-b-value combination might be preferred over the 4-b-value combination due to reduced imaging time and simplified breast DW-MRI acquisition.
A Delong's test of superiority was used to compare the AUCs of the alternate metrics of ADC fast and ADC slow (calculated with b-values of 0/100 and 100/600/800 s/mm 2 , respectively) and f p to the reference ADC metric using all four b-values. Finally, a Delong's test of superiority was also used to compare AUCs between a multivariate model, which included the potentially complementary metrics ADC fast and ADC slow , and the reference ADC metric. Analyses were performed for all cancers and within the HR+/HER2− subtype (identified in the primary analysis as the subtype for which mid-treatment change in ADC was most predictive [14]). To account for multiplicity, we used a hierarchal testing procedure [28] that only performed a hypothesis test within the HR/HER2 subtype when the corresponding test for all cancers was rejected. A Bonferroni correction was used to account for the multiple hierarchal testing procedures and all hypothesis tests were therefore interpreted using a significance level of 0.05/7 = 0.0071.
Repeatability of the different ADC metrics from test-retest acquisitions was evaluated using within-subject coefficient of variation (wCV) and limits of agreement were calculated in conformance with QIBA metrology guidelines [17,29]. Analyses were performed using SAS/STAT v9.4 (SAS Institute, Cary, NC, USA) and R v4.0.2 (R Foundation for Statistical Computing, Vienna, Austria).
Additionally, from the full cohort of 406 ACRIN 6698 patients, a subset of 89 patients consented to the test/retest substudy (Figure 1), of which 71 patients from 8 institutions (median age 46, range 27 to 71 years) had analyzable repeat DW-MRI scans ( Table 1). Three of the eighty-nine patients (3.4%) were excluded for protocol deviations and fifteen (16.9%) for image quality issues in one (n = 7) or both (n = 8) of the test/retest DW-MRI acquisitions, as previously described [12]. Abbreviations: NME, non-mass enhancement; HR, hormonal receptor; HER2, human epidermal growth factor receptor 2; pCR, pathologic complete response.

Correlation between Metrics
Pre-treatment tumor ADC measures calculated with different b-value combinations were highly correlated (r ≥ 0.92), with the exception of ADC fast using the low b-value combination of 0/100 s/mm 2 (r = 0.56-0.71; Table 2). Perfusion fraction, f p, exhibited only weak correlations with the ADC metrics (r < 0.20) except ADC fast (r = 0.75).

Association with Pathologic Response
In general, a greater pre-to mid-treatment increase in tumor ADC was associated with pCR for all b-value combinations ( Table 3). Examples of ADC response are shown in patients with pCR ( Figure 4) and non-pCR ( Figure 5) outcomes. Compared with the reference percent change in ADC using all b-values (0/100/600/800 s/mm 2 ) with AUC = 0.60, 95% CI 0.52-0.68, two-b-value estimates (0/600, 0/800, 100/600, and 100/800 s/mm 2 ) provided comparable performance, and the choice of the maximum b-value (600 vs. 800 s/mm 2 ) did not affect diagnostic performance (AUCs 0.60-0.61; Figure 6, Table 3). Non-inferiority was confirmed for b-value combinations of 0/600, 0/800, 100/600, and 100/800 s/mm 2 (all p < 0.05 after accounting for multiple comparisons). Stratification by cancer subtype demonstrated similar predictive value for ADC using two versus four-b-value combinations within each subtype, with non-inferiority confirmed in the HR+/HER2− subtype where ADC was most predictive of pCR (two-b-value AUCs = 0.75-0.78 versus four-b-value AUC = 0.76, p < 0.05 after multiplicity correction; Table 4, Figure 7).  Table 3. Relative performance of mid-treatment percent change in alternate tumor ADC, f p metrics for predicting pCR. indicates superiority after accounting for multiple comparisons). c One non-pCR case was excluded from the ∆f p calculation due to a negative f p at the mid-treatment time point, attributed to noise or motion. Abbreviations: ADC, apparent diffusion coefficient; pCR, pathologic complete response; AUC, area under the receiver operating characteristic curve; CI, confidence interval; SD, standard deviation.

Non-pCR (N = 140) Mean ± SD (%) AUC [95% CI] p-Value
a p values for the non-inferiority test of AUC versus that of the reference ADC metric using all bvalues, with a non-inferiority margin of 0.02 (p < 0.05/7 indicates non-inferiority after accounting for multiple comparisons). b p value for the superiority test of AUC versus that of the reference ADC metric using all b-values (p < 0.05/7 indicates superiority after accounting for multiple comparisons). c One non-pCR case was excluded from the ∆fp calculation due to a negative fp at the mid-treatment time point, attributed to noise or motion. Abbreviations: ADC, apparent diffusion coefficient; pCR, pathologic complete response; AUC, area under the receiver operating characteristic curve; CI, confidence interval; SD, standard deviation.     Stratification by cancer subtype demonstrated similar predictive value for ADC using two versus four-b-value combinations within each subtype, with non-inferiority confirmed in the HR+/HER2− subtype where ADC was most predictive of pCR (two-b-value AUCs = 0.75-0.78 versus four-b-value AUC = 0.76, p < 0.05 after multiplicity correction; Table 4, Figure 7).      Separating ADC into fast (b-values: 0/100 s/mm 2 ) and slow (b-values: 100/600/ 800 s/mm 2 ) components did not increase predictive performance (AUC = 0.54, 95% CI 0.46-0.62, p = 0.08; AUC = 0.60, 95% CI 0.52-0.68, p = 0.81, respectively; Figure 6b, Table 3), nor did a multivariate model combining ADC fast and ADC slow (AUC = 0.60, 95% CI 0.52-0.68, p = 0.99). Perfusion fraction, f p, tended to decrease slightly more in the pCR group, but was not more predictive than the reference ADC metric (AUC = 0.56, 95% CI 0.47-0.64, p = 0.46; Figure 6b).

Discussion
Our study found that two-b-value combinations for measuring ADC changes were no less predictive of treatment outcome than four-b-value combinations in women undergoing neoadjuvant chemotherapy for breast cancer, with pathologic complete responders demonstrating greater increases in tumor ADC after 12 weeks of therapy than non-complete responders. We also found that using fewer b-values did not reduce the repeatability of ADC as a quantitative breast cancer marker.
Using data from the ECOG-ACRIN A6698 multicenter trial, we evaluated ADC calculations with varying b-value combinations that potentially probe different underlying biological properties, provide different sampling of signal decay as a function of b-value and have different requirements for scan time. These ADC metrics demonstrated AUCs ranging from 0.54 to 0.61 for predicting pCR at mid-treatment. Using fewer than four b-values did not negatively impact performance (AUCs = 0.60-0.61) versus the benchmark established from the trial's primary analysis using all b-values (AUC = 0.60), with the exception of the low b = 0/100 s/mm 2 combination (AUC = 0.54). Moreover, using only higher non-zero b-values to minimize perfusion effects (i.e., measuring the slow ADC component) did not further improve pCR prediction (AUC = 0.60). Our results therefore suggest that for breast tumor diffusion measurements, an acquisition using only two b-values (e.g., 0/600 or 0/800 s/mm 2 ) is sufficient to implement ADC as a reliable quantitative imaging marker in breast cancer clinical trials. This result can reduce the burden on sites (and their patients) implementing the QIBA DWI Profile by decreasing the target ideal number of b-values to two, with a nominal increase in the wCV and the resultant claim.
A growing number of studies have explored ADC for early identification of treatment efficacy in breast cancer; however, to date these studies have varied in study design and DW-MRI approach and typically comprise smaller, single-center datasets (average of 75 patients) [9]. The primary analysis from our large-scale multicenter prospective trial extends this work to provide a more generalizable assessment of ADC as a quantitative biomarker to predict pathologic response to NAC. This secondary analysis further demonstrates the robustness of ADC as a predictive marker in breast cancer treatment, as variable measurement approaches did not notably affect diagnostic performance. Our findings build on prior studies investigating optimal b-value combinations for breast tumor characterization [30][31][32][33][34] and support using the two-b-value combination of 0/800 s/mm 2 as proposed in recent breast DW-MRI consensus recommendations [10], allowing for minimization of DW-MRI scan time and improved suitability for abbreviated breast MRI protocols or for strategies to utilize the extra time to increase spatial resolution.
Results of this study are important for developing guidelines for standardized and accurate use of ADC for identification of therapeutic effects in breast cancer clinical trials. The QIBA group recently added a breast claim to their DW-MRI profile defining error rates for breast tumor ADC calculations [11] based primarily on the A6698 trial testretest data [12]. However, the confidence intervals were only known for the very narrow approach using all four b-values (0/100/600/800 s/mm 2 ) used in the primary analysis for ADC calculation. There remains a dearth of test-retest data in the literature with an adequate sample size to expand upon these recommendations and allow for more flexible acquisition and analysis approaches; QIBA investigators suggest that estimates of precision should be based on a sample size of at least N = 35 to provide true 95% confidence intervals for a patient's quantitative imaging biomarker measurement and for changes in the biomarker over time [29,35]. Results of this study provide new evidence that will allow QIBA to incorporate performance metrics for a wider range of breast DW-MRI protocols, particularly two-b-value acquisitions and greater specification of the target maximum b-value (800 s/mm 2 ). Simplifying the approach to ADC calculation without compromising performance would likely facilitate broader implementation in multicenter trials. The alternate approaches and findings reported here align well with the consensus recommendations of the EUSOBI for standardization of breast DW-MRI (including use of two or more b-values, with a maximum b = 800 s/mm 2 [10]) and thus will enable investigators to develop protocols conforming to both the QIBA and EUSOBI Breast DW-MRI guidelines for accuracy and standardization of breast ADC measures, respectively.
Our study has several limitations. The trial was not powered for these secondary analyses, and larger sample sizes may be needed to identify subtle differences in diagnostic performance. Additionally, we primarily explored only monoexponential ADC modeling over a limited b-value range (0-800 s/mm 2 ), while sampling at higher diffusion weightings (e.g., b ≥ 1500 s/mm 2 ) and utilizing more advanced non-Gaussian or multi-exponential DW-MRI modeling may better characterize the tissue microstructure and improve predictive performance [36][37][38][39]. While we explored perfusion fraction as a potential metric, the bvalues were not optimized for IVIM analysis and extending acquisitions to more than four b-values would be needed to accurately characterize this parameter. In addition, measurements were calculated by averaging over all voxels in the manually defined whole-tumor ROIs. Alternate analytic approaches are under investigation to improve our ability to detect changes in tumor cellularity, including radiomics and histogram-based analyses and characterization of the 'worst' hot-spot (i.e., lowest ADC) tumor subregion. Furthermore, while the predictive performances achieved by ADC measures alone in this study were relatively modest, the ACRIN 6698 primary analysis demonstrated that higher AUCs could be achieved through multivariable modeling to incorporate important clinical characteristics, such as tumor HR/HER2 subtype [14], and additional consideration of other biologic factors (e.g., age, breast density, histopathology) could further improve predictive accuracy. While this analysis focuses on prediction of pCR, in future work it will be informative to test the value of ADC for predicting recurrence-free survival as study follow-up data mature.
In conclusion, these secondary analyses of ADC as a quantitative imaging biomarker in breast cancer treatment suggest that simple whole-tumor ADC measures using a two-bvalue acquisition (e.g., 0/800 s/mm 2 ) provide comparable accuracy and repeatability to four-b-value acquisitions and other two-b-value combinations. Extension of these findings to alternative DW-MRI approaches incorporating spatial variation (histogram analyses, radiomics, etc.) or non-Gaussian diffusion models remains to be investigated. This study contributes important additional data to inform and expand QIBA and other guidelines on optimal implementation of breast DW-MRI and utilization of ADC as a marker of response in breast cancer trials.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data generated or analyzed during the study are available through ECOG-ACRIN, and all images and associated study metadata will be available on The Cancer Imaging Archive (TCIA) website (https://www.cancerimagingarchive.net, accessed on 10 January 2022), planned for public release in Spring 2022.