Prediction of Primary Tumour and Axillary Lymph Node Response to Neoadjuvant Chemo(Targeted) Therapy with Dedicated Breast [18F]FDG PET/MRI in Breast Cancer

Simple Summary Neoadjuvant chemo(targeted) therapy (NCT) can downstage disease burden in breast cancer, allowing less invasive surgery. The ability of sequential hybrid [18F]FDG PET/MRI to predict the final pathologic primary tumour response to NCT in breast cancer was investigated. In addition, the value of sequential hybrid [18F]FDG PET/MRI in predicting axillary response was investigated separately in clinically node-positive breast cancer patients. In this study, final pathologic primary tumour and axillary lymph node response prediction with qualitative or quantitative [18F]FDG PET/MRI after NCT is not reliable. However, combining the relative decrease in [18F]FDG PET and MR imaging variables halfway through NCT improved diagnostic performance, especially in predicting the final pathologic axillary lymph node response. These findings suggest that sequential hybrid [18F]FDG PET/MRI could have complementary value in the early prediction of the final pathologic response to NCT in breast cancer. Abstract Background: The aim of this study was to investigate whether sequential hybrid [18F]FDG PET/MRI can predict the final pathologic response to neoadjuvant chemo(targeted) therapy (NCT) in breast cancer. Methods: Sequential [18F]FDG PET/MRI was performed before, halfway through and after NCT, followed by surgery. Qualitative response evaluation was assessed after NCT. Quantitatively, the SUVmax obtained by [18F]FDG PET and signal enhancement ratio (SER) obtained by MRI were determined sequentially on the primary tumour. For the response of axillary lymph node metastases (ALNMs), SUVmax was determined sequentially on the most [18F]FDG-avid ALN. ROC curves were generated to determine the optimal cut-off values for the absolute and percentage change in quantitative variables in predicting response. Diagnostic performance in predicting primary tumour response was assessed with AUC. Similar analyses were performed in clinically node-positive (cN+) patients for ALNM response. Results: Forty-one breast cancer patients with forty-two primary tumours and twenty-six cases of pathologically proven cN+ disease were prospectively included. Pathologic complete response (pCR) of the primary tumour occurred in 16 patients and pCR of the ALNMs in 14 cN+ patients. The AUC of the qualitative evaluation after NCT was 0.71 for primary tumours and 0.54 for ALNM responses. For primary tumour response, combining the percentage decrease in SUVmax and SER halfway through NCT achieved an AUC of 0.78. The AUC for ALNM response prediction increased to 0.92 by combining the absolute and the percentage decrease in SUVmax halfway through NCT. Conclusions: Qualitative PET/MRI after NCT can predict the final pathologic primary tumour response, but not the ALNM response. Combining quantitative variables halfway through NCT can improve the diagnostic accuracy for final pathologic ALNM response prediction.

Concerning the primary tumour, the accuracy of non-invasive imaging in determining the response to NCT has been investigated extensively. Magnetic resonance imaging (MRI) provides high sensitivity in detecting residual disease (RD), while positron emission tomography with computed tomography (PET/CT) using [18F]-fluorodeoxyglucose ([18F]FDG) has high specificity for the detection of pCR [10,11]. Hybrid [18F]FDG PET/MRI demonstrates complementary performance with the important advantage of combining quantitative [18F]FDG PET and MR imaging variables in a single examination [12].
Concerning ALNMs in cN+ breast cancer patients, the advent of less invasive axillary surgical procedures after NCT has increased the importance of accurate non-invasive axillary response assessment [13,14]. Thus far, non-invasive imaging has not been able to reliably determine axillary response after NCT in pathologically proven cN+ breast cancer patients [15]. The diagnostic performance of sequential hybrid [18F]FDG PET/MRI in axillary response prediction in pathologically proven cN+ breast cancer patients has not yet been investigated. Therefore, the primary aim of this study was to investigate whether sequential dedicated breast hybrid [18F]FDG PET/MRI could accurately predict pathologic primary tumour response to NCT in breast cancer patients. As a secondary aim, pathologic axillary response prediction in pathologically proven cN+ patients was investigated.

Materials and Methods
This prospective, single-centre study was approved by the local medical research ethics committee. Requirement for informed consent was waived, since sequential [18F]FDG PET/MRI was clinically evaluated for response during NCT.

Patients
Female patients with histopathologically confirmed primary invasive breast cancer with a primary tumour larger than 2 centimetres and/or ALNM confirmed by tissue sampling, who completed NCT and were planned to undergo breast and axillary surgery, were eligible for inclusion. Exclusion criteria were pregnancy, neoadjuvant hormone monotherapy, presence of distant metastasis at diagnosis, and contraindications for MRI. Consecutive eligible patients were offered [18F]FDG PET/MRI for response evaluation to NCT, in case baseline imaging with MRI or [18F]FDG PET/CT was not yet performed.

Neoadjuvant Chemo(targeted) Therapy Regimens
NCT consisted of 4 cycles of 3-weekly doses of doxorubicin and cyclophosphamide, followed by 4 cycles of 3-weekly doses of docetaxel in cases of oestrogen receptor (ER)-positive and/or human epidermal growth factor receptor 2 (HER2)-positive breast cancer, or 12 cycles of weekly doses of paclitaxel in cases of triple negative (TN) breast cancer (Table S1). In cases of HER2-positive breast cancer, targeted therapy (trastuzumab with/without pertuzumab) was added to the neoadjuvant treatment regimen.

[18F]FDG PET/MRI
Dedicated breast hybrid [18F]FDG PET/MRI was performed at baseline before NCT (PETMRI-1), halfway through NCT after the first 4 cycles (PETMRI-2), and/or after NCT (PETMRI-3) prior to surgery. All scans were acquired using a 3.0 Tesla integrated PET/MRI system (Biograph mMR; Siemens Healthineers, Erlangen, Germany), following a resting period of 45-60 min after [18F]FDG administration. Prior to an intravenous injection of 2 MBq/kg body weight of [18F]FDG, patients fasted for at least four hours and blood glucose was checked to ensure their levels were below 11 mmol/L. Images were acquired from the diaphragm to the top of the humeral head using a dedicated bilateral 16-channel breast radiofrequency coil (Rapid Biomedical, Rimpar, Germany), while patients were placed in prone position with both arms elevated. A detailed description of the protocol has been described previously and is provided in Table S2 [16].

Image Evaluation
All PET images were evaluated by a final-year resident radiology and nuclear medicine physician (T.N.) with four years of clinical experience in PET imaging, using dedicated software (Syngo.via 6.4, Siemens-Healthcare, Erlangen, Germany). For the quantitative analysis of the primary tumour and ALNM on [18F]FDG PET, a volume of interest (VOI) was placed over the most [18F]FDG-avid component of the primary tumour in the breast or the most [18F]FDG-avid ALN, respectively. Maximum and peak standardised uptake values (SUV max and SUV peak , respectively) were automatically measured [17]. Additionally, an isoactivity contour was automatically drawn in the VOI using pre-set margin thresholds and the SUV mean , metabolic tumour volume (MTV) and total lesion glycolysis (TLG) were calculated. Lastly, the nodal-to-tumour ratio (NT ratio) was calculated by dividing the SUV max of the most [18F]FDG-avid ALN by the SUV max of the primary tumour [18]. In cases of low [18F]FDG avidity, VOI placement was performed with the use of MR images. For the qualitative evaluation of primary tumour response, the complete response of the primary tumour was defined as [18F]FDG uptake that was indistinguishable from the surrounding tissue [17]. For the qualitative evaluation of axillary response, axillary complete response was defined as no ALN with moderately or very intense [18F]FDG uptake [19].
All MR images were evaluated by a dedicated breast radiologist (M.L.) with thirteen years' experience in breast imaging, using dedicated software (PACS System Sectra Workstation IDS7, version 23.1.10, Sectra Group, Linköping, Sweden). For the quantitative analysis of the primary tumour, the longest diameter (LD) was defined as the maximal diameter of an enhancing lesion measured at peak enhancement in any plane, including intervening areas of non-enhancing tissue. Additionally, a researcher (C.M.) dedicated to breast imaging determined the signal enhancement ratio (SER) and apparent diffusion coefficient (ADC) values. For SER measurements, a circular region of interest (ROI) of 5 mm in diameter was placed on the most enhancing part of the primary tumour at peak enhancement. SER was calculated using the following equation: SER = (S 1 -S 0 )/(S 2 -S 0 ), where S 0 , S 1 and S 2 represent the signal intensities on pre-contrast, early post-contrast, and late post-contrast images, respectively [20]. For the ADC measurements, a single ROI was manually drawn on the DW images at b = 1000 s/mm 2 on a region with hyperintensity and relatively low ADC to include the entire tumour in the axial slice where the tumour was the largest, avoiding normal breast parenchyma, fat and regions of high T2 signal (e.g., seroma and necrosis) [21]. In primary tumours without residual enhancement on T1W or hyperintensity on DW images halfway through or after NCT, ROIs were placed in the same tissue region as the prior examination. For the qualitative evaluation of primary tumour response, complete response was defined as the absence of residual enhancing tissue. For the qualitative evaluation of axillary response, all the visible ALNs were evaluated using characteristics of suspicious ALNs, including irregular margins, inhomogeneous cortex, perifocal edema, and absence of fatty hilum or chemical shift artifact [22,23]. Axillary complete response was defined as the absence of ALNs with suspicious characteristics.

Pathologic Response Reference Standard
Pre-treatment core needle biopsies of the primary tumour were used for histological subtyping and grading. Tumours were considered positive for ER or progesterone receptor (PR) if at least 10% of cells showed nuclear staining. HER2 positivity was defined as either a score of 3+ following immunohistochemical (IHC) staining or HER2 gene amplification by fluorescent in situ hybridisation. Grading was performed according to the modified Bloom-Richardson system.
Post-treatment surgical specimens of the breast and the axilla were used to evaluate the response. Histopathological measurement of residual tumour size was performed during grossing and was later correlated microscopically. Primary tumour pCR was defined as the absence of residual invasive cancer in the breast after NCT (ypT0/is). Axillary pCR was defined as the absence of tumour cells or isolated tumour cells (≤0.2 mm or less than 200 cells). Residual axillary disease was defined as the presence of micrometastases (>0.2 and ≤2.0 mm) and/or macrometastases (>2.0 mm).
Histopathological analyses were performed in accordance with the Dutch national breast cancer guideline at the time of diagnosis [24].

Statistical Analysis
The absolute values of all quantitative [18F]FDG PET/MR imaging variables at each time point, as well as the percentage decrease halfway through and after NCT, were compared between patients with a primary tumour and axillary pCR and RD separately by means of the Mann-Whitney U test. For all the significant quantitative variables, receiver operating characteristic (ROC) curves were generated to determine the cut-off value with optimal sensitivity and specificity. Diagnostic performance, expressed as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and area under the curve (AUC) with 95% confidence intervals (95% CIs), was calculated for each significant quantitative imaging variable at the optimal cut-off, both separately and combined. Lastly, the diagnostic performance of qualitative [18F]FDG PET, MRI, and [18F]FDG PET/MRI after NCT were calculated. For all analyses, the detection of residual disease via imaging or pathology analysis was considered as positive and pCR via imaging or pathology analysis was considered as negative. A two-sided p-value of <0.05 was considered to be statistically significant. R project software (version 4.2.0, R Foundation for Statistical Computing, Vienna, Austria) was used to perform the statistical analyses.

Clinicopathologic Characteristics
Between February 2015 and July 2017, 41 breast cancer patients with 42 primary tumours and 26 cN+ axillae were included in this prospective study (Table 1). Primary tumour response evaluation halfway through and after completion of NCT was performed in 38 and 37 patients, and axillary response evaluation in 22 and 21 patients, respectively ( Figure 1).  Foundation for Statistical Computing, Vienna, Austria) was used to perform the statistical analyses.

Clinicopathologic Characteristics
Between February 2015 and July 2017, 41 breast cancer patients with 42 primary tumours and 26 cN+ axillae were included in this prospective study (Table 1). Primary tumour response evaluation halfway through and after completion of NCT was performed in 38 and 37 patients, and axillary response evaluation in 22 and 21 patients, respectively ( Figure 1).

Quantitative Imaging Variables in Relation to Response
In primary tumour pCR, the percentage decrease in SUV max (−82.6 vs. −40.7, p = 0.017) and SER (−30.1 vs. −13.0, p = 0.044) halfway through NCT was significantly higher than in primary tumour RD (Table 2, Figure 2). After NCT, in primary tumour pCR, the median LD was significantly lower (0.0 vs. 15.0, p = 0.018) and the percentage decrease in LD (−100.0 vs. −40.9, p = 0.012) and SER (−54.3 vs. −38.4, p = 0.013) were significantly higher than in primary tumour RD. No differences were reported for MTV and TLG at any of the thresholds, neither in the absolute values nor in the percentages decrease at any time point (Table S3).
In axillary pCR, the median SUV max of the most [18F]FDG-avid ALN (0.5 vs. 0.9, p = 0.030) and NT ratio (0.4 vs. 0.6, p = 0.041) halfway through NCT was significantly lower than in axillary RD. The percentage decrease in SUV max (−88.0 vs. −59.8, p = 0.010) and NT ratio (−59.7 vs. −35.7, p = 0.018) halfway through NCT was significantly higher in axillary pCR than in axillary RD (Table 3, Figure 3). After NCT, in axillary pCR, the median primary tumour LD was significantly lower (0.0 vs. 15.0, p = 0.047) and its percentage decrease (−100.0 vs. −53.8, p = 0.026) was significantly higher than in axillary RD (Table S4).  Quantitative imaging variables are shown as the median and range. Comparison between response groups using Mann-Whitney U test. Symbols: *, similar cut-off values and diagnostic performance for SUV peak , SUV 30% , SUV 40% , and SUV 50% ; †, similar cut-off value and diagnostic performance for SUV peak . Abbreviations: LD, longest diameter; NT ratio, nodal-to-tumour ratio; pCR, pathologic complete response; RD, residual disease; SER, signal enhancement ratio; SUV max , maximum standardised uptake value.        Halfway through NCT, SUV max decreased by 61% to 1.78. After completion of NCT, metabolic dissolution in the entire axillary region without any suspicious nodes on MRI was found. Based on the qualitative evaluation after NCT, this was a false-negative case. Using quantitative imaging variables halfway through NCT, this patient was correctly predicted to have residual axillary disease.

Response Prediction
The diagnostic performance for the prediction of pathologic primary tumour response is summarised in Table 3

Discussion
The aim of this study was to investigate the diagnostic accuracy of sequential [18F]FDG PET/MRI in predicting primary tumour and ALNM response to NCT. For final pathologic primary tumour response prediction, combining the decrease in SUV max and SER of the primary tumour halfway through NCT can improve the value of [18F]FDG PET/MRI, compared to the qualitative evaluation after NCT. In addition, we found that combining [18F]FDG PET and MRI does not improve the diagnostic accuracy of qualitative primary tumour response evaluation after NCT. For final pathologic ALNM response prediction in cN+ breast cancer patients, combining the absolute SUV max measured on the most [18F]FDG-avid ALN halfway through NCT with its relative decrease can accurately predict axillary response. Based on the findings of this study, predicting axillary response with [18F]FDG PET/MRI after NCT is inadequate and does not justify its use.
The diagnostic performance of qualitative primary tumour response evaluation with [18F]FDG PET/MRI was similar to the separate evaluation of [18F]FDG PET and MRI, as indicated by similar AUCs, but displayed improved sensitivity and NPV when combining modalities. This complementary effect can be explained by RD that is either morphologically normalised with residual metabolic activity or has morphological abnormalities without residual metabolic activity. The diagnostic performance of separate qualitative [18F]FDG PET and MRI in detecting primary tumour response after NCT in this study is slightly lower compared to the pooled estimates of [18F]FDG PET/CT and MRI reported in several meta-analyses [25,26]. Similar to our results, the specificity of [18F]FDG PET/CT and MRI after NCT is often low and previous studies have reported that this inability to detect pCR could be explained by NCT-induced inflammation, sclerosis, necrosis, perilesional edema and the presence of ductal carcinoma in situ (DCIS) [27]. Sekine  Increasing evidence indicates that breast cancer subtypes present differently on 18F-FDG PET and MRI, indicated by the significant differences between subtypes in qualitative and quantitative imaging variables [29,30]. These significant differences between subtypes are not limited to baseline, but extend to different patterns of response in non-invasive imaging, which impacts the accuracy of detecting or predicting pathological primary tumour or axillary response [31][32][33]. Consequently, the prediction of pathological primary tumour or axillary response to NCT with non-invasive imaging could benefit from subtypespecific cut-off values for quantitative imaging variables. In addition, differences in the response patterns on MRI also seem to differ between different breast cancer subtypes [34].
Unfortunately, the small sample size in this preliminary study did not permit an analysis per subtype.
Our results suggest that the diagnostic performance in predicting primary tumour response can be improved with quantitative [18F]FDG PET/MR imaging variables. Similar to previous results, the percentage decrease in SUV max halfway through NCT strongly improves sensitivity [10,26]. Interestingly, using a cut-off for the quantitative MR imaging variable LD improved specificity and PPV compared to the qualitative evaluation, possibly by correctly identifying residual enhancement caused by inflammation that is reactive to NCT or DCIS as pCR [10,27]. ADC was not predictive of response to NCT in our cohort of patients and a recent systematic review reports high heterogeneity regarding the clinical and technical aspects of DWI for response prediction [35].  [36]. In contrast, none of the volumetric [18F]FDG PET parameters in this study were found to be associated with response, possible due to inaccurate delineation in cases of response, since a decrease in SUV max paradoxically increases the volume when using automatic isoactivity contouring. Cho et al. performed a second examination after one cycle of NCT and defined pCR as the absence of invasive cancer and DCIS in both the breast and ALNs. Additionally, Cho et al.
did not find the percentage decrease in SUV max to be predictive of response, possibly due to their small sample size. In a cohort of 14 patients, Wang et al. found the combination of percentage decrease in ADC min after one or two cycles of NCT with either SUV max or TLG 40% to be best predictive of response [37]. However, Wang et al. included proton magnetic resonance spectroscopy after DCE-MRI for VOI placement and defined pCR as less than 10% residual cellularity of invasive cells, indicating potential residual cancer in cases with pCR.
The qualitative assessment of axillary response after NCT with dedicated breast [18F]FDG PET/MRI in this study is poor, due to the normalisation of the majority of ALNs on imaging. Sensitivity and PPV of the separate evaluation of [18F]FDG PET are considerably worse compared to the pooled estimates of three primary studies in a recent meta-analysis [15]. However, two of these studies evaluated morphologic criteria for CT and only a decrease in tumour deposit in the ALN was found to be predictive of response in a study by You et al. [38,39]. Similar to our methods, Garcia Vicente et al. evaluated [18F]FDG PET and achieved a sensitivity and PPV of 37% and 68%, respectively [40]. In their study, SUV max measured on the most [18F]FDG-avid ALN ranged up to 13.2, which is considerably higher than the maximum SUV max of 1.3 in our study, possibly explaining the low sensitivity. Regarding the separate evaluation of MRI in this study, the poor sensitivity and PPV cannot be entirely explained. However, Hieken et al. reports metastasis with diameters as large as 12 mm among their false-negative cases, with extranodal extension present in half of these patients [38].
While none of the patients with axillary RD were correctly identified by the qualitative assessment after NCT, the prediction of response halfway through NCT with quantitative [18F]FDG PET imaging variables resulted in sensitivities ranging from 83 to 100%. Combining the absolute decrease with the percentage decrease in SUV max measured on the most [18F]FDG-avid ALN halfway through NCT, a maximum sensitivity or specificity, and thus PPV and NPV, can be achieved. This is in line with the results of two previous studies in which axillary response prediction based on the percentage decrease in SUV max was more accurate early during NCT after one to three courses, compared to after NCT [41,42]. The association of primary tumour LD with axillary response can most likely be attributed to the established correlation between primary tumour and axillary response [43]. Similarly, Eun et al. also found the decrease in primary tumour size to be predictive of axillary response during, as well as after, NCT [44].
Our study has some limitations. The number of included patients was relatively small, especially the cN+ subgroup of patients. As a consequence, we report wide 95% CIs for the diagnostic performance, which should be interpreted with this limitation in mind. Second, the small sample size hindered the separate evaluation of breast cancer subtypes with regard to qualitative and quantitative response evaluation, while it is known to influence diagnostic performance. Third, the interobserver variability of quantitative [18F]FDG PET and MR imaging variables is not assessed in this study. However, Cho et al. has previously reported reliable reproducibility for similar variables [36]. Lastly, we included DCIS as a primary tumour pCR, which could influence the diagnostic performance.

Conclusions
The complementary value of hybrid dedicated breast [18F]FDG PET/MRI in primary tumour response detection is mainly established by combining quantitative imaging variables. For axillary response prediction in cN+ breast cancer patients, combining the absolute SUV max of the most [18F]FDG-avid ALN halfway through NCT with its percentage decrease strongly improved the diagnostic performance of [18F]FDG PET/MRI. Based on the findings of this study, the diagnostic performance in predicting axillary response with [18F]FDG PET/MRI after NCT is insufficient and does not justify its use at this time-point.