1. Introduction
Neoadjuvant chemotherapy (NAC) is one of the standard treatments for early, high-risk breast cancer. NAC provides an opportunity to use imaging or biological markers to monitor tumor response at various time points during treatment. Diffusion-weighted imaging (DWI) is a non-contrast MR imaging technique based upon measuring the random motion of water molecules within tissue. The apparent diffusion coefficient (ADC) is a quantitative measure derived from at least two DWI images with different b-values. Many breast DWI studies have shown that tumor ADC can provide valuable information for evaluating tumor response in the NAC setting [
1,
2,
3,
4], and can provide distinct information from quantitative measurements provided by dynamic contrast-enhanced (DCE) MRI. Compared to DCE-MRI, breast DWI often has poorer image quality due to artifacts and lower spatial resolution, and lacks standardization in image acquisition, image interpretation, and ADC calculation [
5], factors that limit its widespread use in clinical practice.
Recently, a large multi-center clinical trial, the American College of Radiology Imaging Network (ACRIN) 6698 trial [
6,
7], conducted as a sub-study of the Investigation of Serial Studies to Predict Your Therapeutic Response through Imaging and Molecular Analysis 2 (I-SPY 2 TRIAL) demonstrated that change in mean ADC after 12 weeks of therapy (inter-regimen) was predictive of pathologic complete response (pCR) [
6]. A test–retest study using a sub-cohort of ACRIN 6698 patient scans reported excellent repeatability and reproducibility of tumor mean ADC measurements, with intraclass correlation coefficient (ICC) of 0.92 [95% CI: 0.80–0.97] [
8]. However, it was evaluated only in a small cohort (
n = 20).
Various ROI delineation methods and inter-reader studies of ADC measurements have been reported [
9,
10,
11,
12,
13,
14]. Most focused on the comparison of mean ADC values [
10,
11,
12], and several studied the impact of ROI placement methods on the diagnostic performance of ADC [
13,
14,
15]. Very few studies analyzed the impact of ROI placement methods on the evaluation of response to NAC for breast cancer [
9]. Van Heeswijk et al. included histogram metrics (min, max, median, 5th to 95th percentiles) in addition to mean ADC in their study of rectal tumors. All these studies above used imaging data acquired at a single institution. In this paper, we describe a retrospective analysis of an inter-reader study of three types of ROIs on the prediction of pathologic complete response (pCR) using data from I-SPY 2, a multi-center NAC clinical trial.
2. Materials and Methods
2.1. Patient Population
Women who are 18 years of age or older, are diagnosed with clinical stage II or III breast cancer, and never had previous treatment of surgery or systemic therapy for this cancer, are eligible to participate in the I-SPY 2 TRIAL. The tumor size should be at least 2.5 cm measured by clinical assessment or by imaging. Hormone receptor (HR)-positive and human epidermal growth factor receptor 2 (HER2)-negative cancer with low risk assessed by MammaPrint (Agendia, Amsterdam, The Netherlands) are excluded from the trial.
A cohort of 249 women enrolled in I-SPY 2 and randomized to pembrolizumab plus standard or corresponding control arm (standard chemotherapy) at qualified study centers were considered for inclusion in this analysis. The I-SPY 2 TRIAL (ClinicalTrials.gov identification number: NCT01042379) is HIPAA-compliant and was performed under Institutional Review Board (IRB) approval. All patients gave informed consent prior to enrolling and before starting treatment. All patients had human epidermal growth receptor 2 negative breast cancer, verified at baseline. The primary endpoint of I-SPY 2 is pCR, defined as the absence of invasive tumor in breast and lymph nodes at the time of surgery.
2.2. Imaging Acquisition
The MRI component of the I-SPY 2 trial consisted of four sequential MRI exams acquired: before NAC (T0), after 3 weeks of NAC (T1), inter-regimen (T2), and pre-surgery (T3). In this study, we used MRI examinations at T0 and T1. MRI examinations were performed on 1.5 T or 3 T scanners across a variety of vendor platforms and institutions using a dedicated breast coil and prospectively defined protocol. DWI-MRI was performed using a fat-suppressed single-shot echo planar imaging sequence with the parameters TR ≥ 4000 ms, TE = 50–100 ms, FOV = 260–360 mm to achieve full bilateral coverage, acquisition matrix = 128–192 with in-plane resolution ≤ 1.9 mm, slice thickness = 3–5 mm, slice gap ≤ 1 mm, and number of signal averages ≥ 2. Diffusion weighting b-values of 0 and 800 s/mm2 were specified, with an acquisition time no longer than 5 min. DCE-MRI was also acquired during the same MRI scanning. Three-dimensional fat-suppressed T1 were acquired before and after injection of a gadolinium contrast agent. Post-contrast imaging was started simultaneously with injection. Phase duration was 80–100 s with a minimum of 8 min of imaging following injection.
2.3. Image Analysis
A standardized quality ranking system was used to evaluate DWI studies for the three image-quality categories: (1) artifacts, (2) fat suppression, and (3) signal-to-noise ratio (SNR). One of the readers (WL) evaluated the study and gave an overall quality rating of poor, moderate, or good. “Poor” image quality refers to severe artifacts and/or failed fat suppression and/or low signal-to-noise ratio in the tumor area in either original DWI at b = 800 s/mm
2 or derived ADC map. “Moderate” image quality refers to when original DWI or derived ADC map has issues in one or more of the categories above, but ROI delineation is still possible. “Good” image quality refers to no obvious issues in any of the three categories above. Poor-quality exams were considered not analyzable and were excluded from the study. Moderate- and good-quality studies were then evaluated as analyzable or not analyzable based on the degree to which any negative quality issues were found to prevent confident definition of a ROI. Three patients with missing pCR outcome were excluded (see
Figure 1 for details on data inclusion/exclusion).
ROI evaluation was performed on DWI images acquired at T0 and T1 by two readers with 7 and 0 years of experience in breast DWI, blinded to pathologic outcomes. Each reader was trained on a DWI cohort of 30 patients acquired at T0 and T1 (60 exams in total). The training set was randomly selected from two other drug arms in I-SPY 2 with matching sites and HR positive/negative ratio as the full cohort (n = 249) in this study. Disagreement or difficult cases were discussed with a breast radiologist.
DWI, ADC map and DCE subtraction images were simultaneously checked to localize the lesion. The tumor was then localized on the b = 0 DWI image using similar breast anatomical structures to those seen on subtracted DCE images and was delineated on the ADC map to encompass areas with low ADC values and high signal intensity in the b = 800 DWI. Surrounding fat and tissue were excluded to eliminate partial volume effects. Care was taken to avoid cystic or necrotic regions and areas exhibiting T2 shine-through.
Three types of breast tumor ROIs were analyzed in this study: multiple-slice restricted ROI, single-slice restricted ROI, and single-slice tumor ROI.
Table 1 describes each ROI delineation technique in detail. Two types of ROI—multiple-slice restricted ROI and single-slice tumor ROI—were manually delineated (see
Figure 2 for example cases). The third type of ROI—single-slice restricted ROI—was automatically generated from multiple-slice restricted ROI on the same axial slice as the single-slice tumor ROI. Multiple-slice ROIs were delineated on all slices where tumor could be seen. Restricted ROIs focused on lower ADC areas only while tumor ROIs enclosed the whole tumor area that could be seen in both DWI and DCE-MRI. Special care was taken to place ROIs at the same locations for the same lesion in T0 and T1. The in-house software developed using IDL (Exelis Visual Information Solutions, Boulder, CO) was used to calculate ADC based on the classic mono-exponential decay model: ADC = [ln(S_0) − ln(S_800)]/800, where S_0 and S_800 are signal intensities acquired with b-values of 0 and 800 s/mm
2.
ADC metrics—mean and percentiles of the ADC histogram (minimum, 5th percentile, 15th percentile, 25th percentile, 50th percentile, 75th percentile, 95th percentile, maximum)—were automatically calculated from all three types of ROIs delineated by each reader at T0 and T1, respectively. Percentage changes in ADC metrics from T0 to T1 were also calculated to evaluate effect of inter-reader variability on treatment response.
2.4. Statistical Analysis
The variability of ADC measurements between two readers was evaluated using the ICC [
16]. The ICCs were calculated with the irr package version 0.84.1 in R version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria) [
17]. Predictive performances of percentage change in ADC metrics from T0 to T1 were assessed by area under the receiver operating characteristic curves (AUCs) to illustrate the tradeoff between sensitivity and specificity on various cutoff values in the prediction of pCR.
p-values of difference between two AUCs were determined using the DeLong test. The pROC package in R version 4.0.3 was used to analyze receiver operating characteristic (ROC) curves and calculate AUCs [
18].
4. Discussion
In this multi-center study, we investigated the inter-reader variability of DWI as a marker of breast cancer response to therapy using different tumor ADC metrics and ROI delineation approaches. In general, our results showed similar agreement of tumor ADC metrics extracted from ROIs delineated by two readers compared to other previous breast cancer studies [
8,
12]. Furthermore, predictive performances of changes in these ADC metrics (after 3 weeks of treatment) generated from different readers were also very close. Overall, these results indicate good reproducibility of quantitative breast tumor ADC measurements using manually delineated ROIs in DWI-MRI.
Manual delineation of ROIs on patient image data inevitably has a subjective component. ICC values reported from our study demonstrated high reproducibility of mean tumor ADC (ICC > 0.96) evaluated either at pretreatment (T0) or at early post-treatment (T1, 3 weeks after treatment initiation) compared to what has been achieved in the previous studies based on other breast tumor ADC metrics [
8,
19,
20]. The ICC value broadly agrees with the ICC reported in the ACRIN 6698 test–retest study [
8] where in a small subset of the DWI exams (
n = 20), the ICC of mean ADC in the whole-tumor ROI between two readers was estimated to be 0.92. However, it was not clear for which treatment time points these 20 exams were performed (T0 or T1). Jang et al. assessed reproducibility of ADC measurements in malignant breast masses [
12]. Their manual ROI delineation method was similar to the single-slice tumor ROI in this study. In that study, two radiologists with three and six years of experience interpreted breast MR images independently with an estimated ICC of 0.751 (95% CI: 0.573 to 0.855) in a cohort of 66 patients. The reason a higher ICC was achieved in the present study could be that the two readers were trained using a training set in advance, and consensus was made in the separate training set before readers started the ROI delineation in the cohort (
n = 103) of this study.
Three types of ROIs were analyzed in our study. The first type was multiple-slice restricted ROI, which was designed to cover the most diffusion-restricted area of the lesion on multiple slices. The second type—single-slice restricted ROI—was automatically extracted from the first using only one axial slice that showed the largest lesion area. The third type was the single-slice tumor ROI, designed to cover the whole lesion on the same axial slice as the second type. The rationale of testing the reproducibility of these three ROI types was that whole-tumor ROI and restricted ROI are the most commonly used manual delineation methods in the literature [
6,
14,
21]. The use of small ROIs focusing on the most diffusion-restricted lesion area is recommended by the European Society of Breast Radiology (EUSOBI) and supported by published studies [
19,
21,
22]. Our results showed comparable reproducibility of ADC measurements among different types of ROIs, except minimum and maximum ADCs, where tumor ROI achieved higher ICC than restricted ROIs. This observation was more obvious for percentage changes in ADC measurements, which indicates that percentage changes in minimum or maximum tumor ADCs may not be as reliable as other ADC measurements to reflect changes induced by treatment. The ICCs for percentage changes in minimum ADC were 0.045 and 0.083 for multiple-slice and single-slice restricted tumor ROIs, respectively. For comparison, the ICC for percentage change in minimum ADC for single-slice tumor ROI was 0.86. This result suggests that minimum ADC in restricted ROIs may be more subject to inter-reader variability than tumor ROI.
Results from the ACRIN 6698 clinical trial indicate that the percentage change in tumor ADC is predictive of pathologic complete response [
6,
8]. Therefore, we evaluated the reproducibility of percentage change in ADC measurements in this study. For mean tumor ADC, the highest ICC was found in multiple-slice restricted ROIs, which was confirmed by the Bland–Altman plot in
Figure 4. However, it was not observed at the lower (minimum and 5th percentile) and higher spectrums (95th percentile and maximum) of the histogram. ACRIN 6698 also reported ADC histogram metric reproducibility results from a small sample (
n = 20) [
8], in which the highest inter-reader reproducibility was observed at low percentiles (15th and 25th). It is interesting to note those study findings align well with our results if we take into account differences in ROI delineation approach. ACRIN 6698 used a 3D whole-tumor ROI approach, where low percentiles likely represent ADC values of the more restricted part of the tumor, which is comparable to mean ADCs for our multi-slice restricted ROI approach, which demonstrated the highest reproducibility in our study.
This study also evaluated the predictive performances of the percentage change in ADC metrics by ROI type and by reader. Overall, similar AUC values were observed for percentage change in ADCs from Reader 1 and Reader 2. Interestingly, highest AUCs were observed based on mean ADC for restricted ROIs but minimum ADC for single-slice tumor ROI. This observation suggests that the most restricted area of the lesion could be more reflective of treatment response. However, minimum ADC may suffer from poor reproducibility.
This study has several limitations. First, the three types of ROIs could be highly correlated, especially when some ROIs happened to be small in all three types. In addition, the single-slice restricted ROI was completely included in the multiple-slice restricted ROI. This might have led to similar ICC results for multi-slice restricted ROI and single-slice restricted ROI. Second, whole-tumor ROIs were not delineated for a full comparison of restricted versus tumor ROIs. Third, nearly 50% of the original cohort was not analyzable because of poor image quality or other factors preventing ROI delineation, which reduced the sample size by half.