Retrospective Correction of ADC for Gradient Nonlinearity Errors in Multicenter Breast DWI Trials: ACRIN6698 Multiplatform Feasibility Study

The presented analysis of multisite, multiplatform clinical oncology trial data sought to enhance quantitative utility of the apparent diffusion coefficient (ADC) metric, derived from diffusion-weighted magnetic resonance imaging, by reducing technical interplatform variability owing to systematic gradient nonlinearity (GNL). This study tested the feasibility and effectiveness of a retrospective GNL correction (GNC) implementation for quantitative quality control phantom data, as well as in a representative subset of 60 subjects from the ACRIN 6698 breast cancer therapy response trial who were scanned on 6 different gradient systems. The GNL ADC correction based on a previously developed formalism was applied to trace-DWI using system-specific gradient-channel fields derived from vendor-provided spherical harmonic tables. For quantitative DWI phantom images acquired in typical breast imaging positions, the GNC improved interplatform accuracy from a median of 6% down to 0.5% and reproducibility of 11% down to 2.5%. Across studied trial subjects, GNC increased low ADC (<1 µm2/ms) tumor volume by 16% and histogram percentiles by 5%–8%, uniformly shifting percentile-dependent ADC thresholds by ∼0.06 µm2/ms. This feasibility study lays the grounds for retrospective GNC implementation in multiplatform clinical imaging trials to improve accuracy and reproducibility of ADC metrics used for breast cancer treatment response prediction.


INTRODUCTION
The American College of Radiology Imaging Network (ACRIN) 6698 multicenter breast cancer imaging trial evaluated the use of tumor apparent diffusion coefficient (ADC), measured from diffusionweighted magnetic resonance imaging (MRI) (DWI), for prediction of therapy response (1). The conventional predictive model for response and clinical management uses change in the size of solid tumor (2). The ADC is modulated by tissue cellularity (3,4) and therefore is potentially more specific to differentiation of malignant tissue, with higher cell density,versus benign and necrotic tissue, with lower cell density, for accurate assessment of cytotoxic treatment effects leading to tumor volume change (5,6). The predictive power of the ADC-based diagnostic metric depends on the ability to measure changes in tumor biological characteristics beyond the nonbiological (technical) measurement errors (7,8). The confidence intervals for ADC metrics are determined (7) both by intrascan precision (repeatability) and interscan accuracy (reproducibility). While precision is typically assessed by within-subject repeatability (8,9), accuracy is determined with respect to known ADC values, provided by a phantom (10) or a bias-free tissue reference (11).
Substantial ADC bias, exceeding measurement precision, has been demonstrated by multiplatform studies of quantitative diffusion phantoms (12,13) for anatomic locations offset from the magnet isocenter. This bias is primarily attributed to spatially nonuniform diffusion weighting (b-value) owing to system-dependent gradient nonlinearity (GNL) (14,15). In contrast to geometric GNL distortions, routinely corrected on clinical scanners, this persisting bias causes deviation from nominal weighting (b-values) for DWI voxel signals (14,15). For horizontal-bore clinical MRI systems, GNL bias shows a characteristic "saddle" pattern with effective b-values inflating ADC measured for offsets in the right-left/anterior-posterior (RL/AP) direction and reducing ADC for superior-inferior (SI) offsets (13,14). For the same gradient system and offset from isocenter, the absolute SI bias is typically (2-to 3-fold) higher than RL/AP (13). This systematic ADC bias is completely determined by the gradient platform and the tissue offset from magnet isocenter (13)(14)(15), and it remains stable over time, as demonstrated by longitudinal studies conducted on scanners with the same gradient model (13,16). Systematic differences between gradient models adversely affect interplatform reproducibility of ADC measurements. Similar to the routine correction of geometric distortions in MRI, this platform-specific spatial b-value bias can be corrected using models based on spherical harmonics (SPHs) basis functions characterizing the gradient fields (14,15,17). The gradient design-specific SPH coefficient tables are known to system engineers (11,17) and routinely used for correction of geometric MRI distortions of the same GNL origin.
Improved ADC accuracy after correction for spatial nonuniformity of diffusion weighting (DW) induced by GNL was previously demonstrated for off-center anatomy on a single-vendor platform (11,12). In the multisite clinical trial setting, patient DWI scans are typically performed on multiple MRI platforms, while a single core-lab is charged with the centralized data analysis. A retrospective multiplatform correction workflow with empiric GNL assessment was validated in a recent collaborative project of the MRI working group within the NCI Quantitative Imaging Network (QIN) (18,19). This work also showed superior performance of GNL correction (GNC) based on gradient system descriptions available from vendors (11,12,16). With the goal to improve interplatform reproducibility and accuracy of breast tumor ADC measures, this current study assessed the feasibility for centralized implementation of retrospective GNC using vendorprovided gradient system characteristics in a clinical trial setting.

METHODOLOGY Phantom and Subject Data
For the ACRIN 6698 trial, bilateral axial trace-DWI for ice-water DWI quality control (QC) phantoms (20) and study subjects (1) were acquired at 3 T and 1.5 T field strengths with their dedicated (7-, 8-, and16-channel) breast coils at 10 imaging centers on MRI scanners from 3 different vendors, comprising 10 different gradient configurations. The same breast coils were used equally on 1.5 T and 3 T scanners by all vendors, and the 16-channel array was used by a single vendor only. DWI was performed (1) using standardized single-shot echo-planar imaging (SS EPI) sequences with b = 0, 100, 600, 800 s/mm 2 . ADC maps were generated using a mono-exponential diffusion model with nominal b-values (without correction). A subset of 60 clinical trial subjects underwent test-retest scans before treatment to assess baseline within-subject precision (9) of ADC measurements. For this feasibility study, tumor ADC histograms of these subjects were compared before and after GNC to assess improvement of intraplatform accuracy and interplatform reproducibility. The studied subset included subject scans at 3 T (N = 25) and 1.5 T (N = 35) from 8 sites on 6 gradient configurations from 3 vendors. These configurations represented the range from negligible to moderate GNL for scanners accepted to the trial (20). The QC phantom ADC maps were analyzed for each of the gradient models corresponding to the in vivo scans (ie, 6 gradient configurations) to access GNC accuracy. Given the previously demonstrated stability of system GNL characteristics (16,19), a single phantom scan was used per gradient system.

System GNL Correction
An MRI gradient system inventory was compiled based on public and vendor-specific private Digital Image Communication in Medicine (DICOM) (21) header information for the quantitative DWI phantom scans ( Figure 1) used for trial scanner acceptance testing (20). To be accepted per the ACRIN 6698 trial protocol, scanners had to show GNL-induced bias below a QC threshold of 10% for typical breast imaging positions (RL/AP offsets of 70-90 mm) using the ice-water DWI phantom. Gradient-channel SPH design coefficients (14) and normalization conventions were provided by vendors (16) and used for the calculation of system gradient fields and their spatial derivatives (GNL tensors LðrÞ (14)) on a 4-to 5-mm 3D grid. Direction-averaged corrector maps, , were then constructed covering the full characteristic gradient diameters (500-660 mm) using DWI gradients, u k , along the primary magnet axes. For phantom and human subject DWI, the system correctors were 3D-spline interpolated according to the DICOM header information for each imaged volume and resolution (eg, Figure 1A). ADC correction was then performed by means of pixelwise division, ADC GNC ¼ ADC=C ave , where C ave is direction-average (trace-DWI) corrector map. GNC was automated using shared p-libraries developed in Matlab R2015b (MathWorks, Natick, MA).

ADC Histogram Analysis
For ice-water DWI phantom scans, the fractional ADC bias, (ADC À ADC 0 )/ADC 0 , was estimated using the known diffusion value of ice-water (0°C) ADC 0 = 1.1 mm 2 /ms (22). Corresponding fractional b-value bias derived from system GNL model was C ave À 1 ( Figure 1A). The phantom volumes of interest (VOIs) encompassed a uniform tube area in either the right or the left jar ( Figure 1B) for an axial 4-mm-section within SI = 615 mm, avoiding susceptibility and parallel imaging artifacts. Typical phantom VOI size was about 1.5 Â 8 Â 0.4 cm 3 (RL Â AP Â SI, with a volume range of 4-6 cm 3 ; Table 1). To reflect the platform-specific GNL impact for the clinical trial cohort, a multiplatform average phantom ADC histogram ( Figure 1C, bin-size of 0.01 mm 2 /ms) was generated by weighting each gradient system histogram (normalized to total volume) by the number of subject scans performed on each given platform ( Figure 1D).
For breast tumor scans, multislice whole-tumor VOIs were manually defined as previously described (1) by selecting regions with hyperintensity on high b-value DWI (b = 800 s/mm 2 ) and relatively low ADC, while avoiding adjacent adipose and fibroglandular tissue, biopsy clip artifacts, and regions of high T2 signal (eg, seroma and necrosis). Breast tumor VOI centroid locations spanned RL: 4-12 cm, AP: 0-9 cm, SI: 0-6 cm, and volumes: 1-34 cm 3 (median = 5 cm 3 ; mean = 8 cm 3 ). For subjects with VOI <5 cm 3 (N = 32), the larger volumes were chosen between test-retest scans to minimize histogram sampling bias, and the first of the repeat scans was used for the others (VOI > 5 cm 3 ). Subject-/scanner-specific ADC VOI histograms were  binned at 0.03 mm 2 /ms and normalized to total voxel count. Median tumor VOI centroids, extents, and histogram metrics were derived for subject scans pooled by gradient platforms ( Table 2). The phantom ADC correction performance was quantified based on the reduction of fractional bias for histogram metrics (median and full width at half-maximum [FWHM], range = median 6 FWHM) of individual scanners ( Figure 1C), and improved alignment of the histograms across scanners. The effect of in vivo GNC (Figure 2) was assessed from changes in scanner-/subject-average tumor ADC histogram percentiles (cumulative fractional volumes). All image and histogram analyses were performed using in-house software in Matlab 2015b and IDL (Exelis Visual Information Solutions, Boulder, Colorado). Figures 1 and 3 illustrate the excellent performance of GNC for the quantitative DWI phantom scanned on representative ACRIN 6698 gradient platforms from 3 different vendors. System-specific GNL nonuniformity induces moderate positive ADC bias that both shifts and distorts phantom ADC histograms ( Figure  3A) from different gradient platforms independent of the magnetic field strength (Table 1). Narrower histograms observed for 3 T systems compared with those for 1.5 T (Figure 3, A and B; Table 1) are consistent with random noise (non-GNL) origin of residual intrasystem broadening of the phantom ADC, reduced at higher field strength. Three of the 6 gradient models show low GNL bias (median bias < 1%, FWHM = 3%-4%), versus moderate bias (median $5%-10%, FWHM = 3.5%-10%) for the other 3 platforms. Inclusion of low and moderate GNL systems results in bimodal pre-GNC system-averaged histogram ( Figure 3C, red) with a broad ADC bias (median = 6%, FWHM = 13%). This histogram is narrowed by GNC to a single-mode distribution with negligible bias (Figure 3C, blue; median = 0.5%, FWHM = 4.5%), and substantial ADC percentile increase ( Figure 3D, dashed) peaking at 55 th percentile. GNC improves ADC accuracy for individual systems ( Figure 1C and Figure 3B), and it reduces the system-averaged histogram widths ( Figure 3C, D, blue; FWHM = 4.5%). The median bias error is effectively eliminated (from 6% to <1%), and systematic interplatform variability is reduced (from the median range of 11% to <3%). The GNC effects on breast tumor ADC histograms (Figure 4) are qualitatively similar to those in the phantom, showing intrasystem narrowing and improved intersystem alignment ( Figure  4A and B). Tumor histograms exhibit 7-fold broader ADC ranges than phantom histograms, reflective of the large lesion ADC heterogeneity compared with the GNL-induced biases. Tumor VOI characteristics are consistent across gradient models (Table 2). Percent decrease in tumor median ADC post GNC is consistent with platform-specific GNL bias. No detectable dependence on field strength (signal-to-noise ratio) is observed for tumor histogram width before or after GNC. The average multisystem, multisubject histogram for breast tumors is affected both by spatial system GNL pattern (gradient characteristics and VOI offsets) and subject tumor characteristics (eg, relative fraction of solid tumor at lower ADC values). A notable ADC histogram shift after correction ( Figure 4C, blue) leads to increased tumor volume estimates based on low ADC threshold (16% volume difference [dashed green] for ADC < 1 mm 2 /ms). GNC reduces median ADC of the average histogram from 1.06 to 1.0 mm 2 /ms, and FWHM from 0.6 to 0.56 mm 2 /ms. The ADC percentile differences ( Figure  4D, dashed gray) are nearly flat (5%-8%) between the 10th and 90 th percentiles, confirming a larger relative GNC effect on low ADC volumes (eg, 5% = 1/4 of fraction volume increase for the 20 th percentile). Percentile-dependent ADC threshold is uniformly shifted by correction to lower values by $0.06 mm 2 /ms ( Figure 4D, blue).

DISCUSSION
Consistent with previous single-vendor platform observations (11,12), GNL-induced DW nonuniformity led to reduced accuracy of absolute ADC values and increased intersystem variability. Mostly positive ADC bias was observed for breast anatomy consistent with RL/AP GNL-pattern predicted to inflate ADC values (12,13). Phantom GNC correction improved ADC accuracy 6-fold and interplatform reproducibility 3-fold, and adequate correction performance was confirmed across all studied vendor gradient models. Residual (non-GNL) phantom ADC errors scaled with magnet field strength (and signal-to-noise ratio). In vivo GNC for the studied subjects had a notable effect on lower histogram percentiles (16% higher volume below 50 th percentile) and shifted ADC thresholds lower by $0.06 mm 2 /ms. Relative GNL bias effects on the population average tumor ADC histogram (weighted by the number of subjects scanned on each system) were primarily determined by the range of gradient system characteristics and VOI offsets. If left uncorrected, such bias combined with repeatability error (8,9) would confound determination of reproducible thresholds for tumor volume changes (2,5,6).
Our study confirmed the feasibility of a practical implementation of the 2-step workflow for the centralized retrospective GNC (19) of ADC biomarker in the context of a multicenter clinical trial (1). In the first step, static GNL corrector maps were precalculated once for each gradient platform based on vendorprovided gradient characteristics for the trial scanner inventory. In the second step, correctors were mapped automatically using DICOM header information onto the scan-specific geometry for each subject DWI acquisition. The relatively low anisotropy of breast tumors allowed additional GNC simplification through the use of direction-independent average correctors directly applied to subject ADC maps (15).
As the studied DWI data were derived from the ACRIN 6698 trial, the maximum observed bias was limited by the 6698 QC process, which disqualified scanners with excessive GNL (20). For the studied gradient platforms and tumor VOIs, absolute RL bias was moderate (up to 15%), but can be many-fold higher along the SI direction for different body habitus (eg, abdomen, spine, whole body) scanned on the same gradient systems (11,13) with larger SI offsets. Furthermore, only 2 gradient models were representing each vendor platforms in this feasibility study. Because observed results apply to only the particular gradient models tested, they may not be representative of other models  from the same vendor. Our proposed GNC implementation could open future clinical trials to a wider system enrollment by relaxing scanner acceptance requirements independent of target anatomy (8). For longitudinal treatment imaging points, GNC could also reduce errors caused by relative changes in tumor VOI locations (12), and it may allow for further simplification of clinical trial workflows permitting longitudinal studies on a particular patient to be performed on different system configurations. The current feasibility study sampled a subset of 6 gradient platforms at a single (pretreatment) imaging point. To show GNC benefit for treatment response prediction, future work will extend implemented centralized GNC workflow to all 10 gradient platforms and all imaging points of the ACRIN 6698 trial (1).
In conclusion, our study showed the feasibility of centralized retrospective ADC correction for DWI acquired as part of a multiplatform cancer imaging trial. The notable correction impact on tumor ADC histogram percentiles promises improvement of accuracy and reproducibility for diagnostic and prognostic thresholds sought for quantitative breast cancer treatment response assessment.