MRI Evaluation of Complete and Near-Complete Response after Neoadjuvant Therapy in Patients with Locally Advanced Rectal Cancer

Purpose To evaluate MRI performance in restaging locally advanced rectal cancers (LARC) after neoadjuvant chemoradiotherapy (nCRT) and interobserver agreement in identifying complete response (CR) and near-complete response (nCR). Methods 40 patients with CR and nCR on restaging MRI, surgery and/or endoscopy were enrolled. Two radiologists independently scored the restaging MRI and reported the presence of split scar sign (SSS) and MRI tumor regression grade (mrTRG). Diagnostic accuracy and ROC curves were calculated for single and combined sequences, with inter-reader agreement. Results Diagnostic performance was good for detecting CR and weaker for nCR. T2WI had the highest AUCs among individual sequences. There was a significant positive correlation between SSS and CR, with high Sp (89.5%/73.7%) and PPV (90%/79.2%) for both Readers. Similar accuracy rates were observed for the combination of sequences, with AUCs of 0.828–0.847 for CR and 0.690–0.762 for nCR. Interobserver agreement was strong for SSS, moderate for T2WI, weak for the combination of sequences. Conclusions Restaging MRI had good diagnostic performance in identifying CR and nCR. SSS had high Sp and PPV in diagnosing CR, with a strong level of interobserver agreement. T2WI with DWI was the optimal combination of sequences for selecting good responders.


Introduction
The main role of neoadjuvant chemoradiation therapy (nCRT) in locally advanced rectal cancers (LARC) is to downsize and downstage the primary tumor, yielding a pathologic complete response in 10% to 30% of patients [1,2]. Although radical surgery remains the standard of treatment according to actual guidelines [3,4], in the last two decades a "watch-and-wait" strategy has been increasingly offered as an option for patients with a good response to nCRT [5]. This is due to the substantial postoperative morbidity of rectal resections [6]. Even though it is still controversial, the trend of conservative treatment with observation alone is sustained by the favorable oncologic outcome in regard to recurrence and survival [1].
The goal of the "watch-and-wait" approach is to identify patients with clinical complete response after neoadjuvant therapy and enter them in a strict follow-up protocol,

Study Design and Patients
This was a prospective, observational, single-center study performed between March 2017 and December 2021 in the Oncological Institute Cluj-Napoca. Consecutive patients with histologically confirmed rectal tumors, locally advanced, treated with nCRT were enrolled. Rectal MRI was performed before and after nCRT. Patients with CR or nCR on either restaging MRI or pathological reports from surgery and/or endoscopy were included in this study.
We excluded complete or near-complete responders at endoscopy and/or surgery who did not have restaging MRI, non-oncologic deaths and patients who were not followed up at least 6 months after nCRT.
The follow-up protocol consisted of DRE, MRI and endoscopy performed every three months during the first year, every six months in the second year, and annually up to five years after inclusion-protocol after Maas et al. [13].
The study was approved by the Ethics Commission of the "Iuliu Hatieganu" University of Medicine and Pharmacy Cluj-Napoca, Nr. 336, dated 30 September 2014, and written informed consent was obtained from all patients prior to any study procedures. The study was performed in agreement with The Code of Ethics of the World Medical Association (Helsinki Declaration) for experiments involving human subjects.

MRI Protocol
MRI examinations were performed using a 1.5T Magnetom Aera scanner (Siemens Healthcare, Erlangen, Germany), with a 30-element body matrix coil (acquisition parame-ters are shown in Table 1). For all staging and restaging assessments, we used a protocol derived from the recommendations for MRI rectal cancer evaluation and reporting provided by the ESGAR guidelines [10]. Patients were examined in feet-first supine position, using the same MRI protocol: T2WI, DWI with apparent diffusion coefficient (ADC) map, T2 high-resolution (T2HR) angulated perpendicular and parallel to the rectal tumor axis, and T1C (Table 1). A microenema prior to MRI exam was used in a subset of patients. DCE included axial T1 sequences (36 phases; mean acquisition time: 5 min) with gadoliniumbased contrast agent Gadobutrol (Gadovist, Bayer) administered i.v. as a bolus, at a dose of 0.1 mmol/kg body weight at a rate of 2 mL/s, followed by a 20 mL saline flush. The timeframes of performing the MRI were at baseline (before any treatment procedure) and at restaging, with a median time between the finalization of radiotherapy and MRI reassessment of 6.5 weeks.

Image Interpretation
All MRI examinations-both the baseline and the restaging assessments-were read by two independent radiologists (Readers 1 and 2) with 9 and 8 years of experience in rectal MRI, who were blinded to each other's readings. The images were analyzed using Syngo (VB17) software, commercially available applications and OsiriX MD viewer. Both radiologists were required to independently read the restaging MRI, report the T stage, N stage, the presence of tumoral perirectal deposits (N1c), extramural vascular invasion (EMVI) and circumferential resection margin (CRM). Five grade confidence level scores modified after Maas et al. [13], were used to evaluate response to treatment on T2WI, DWI and T1C, respectively (Table 2).
We reported an MRI tumor regression grade (mrTRG) based on the combination of T2WI, DWI and T1C (Table 3), adjusted after the system developed by the Mercury study group [18][19][20]. The highest individual sequence score determined the overall score for the combination of sequences and mrTRG. For patients with mucinous tumors, we used the modified mrTRG system proposed by Park et al. [21]. Radiologists were asked to note the presence or absence of the split scar sign (SSS) on T2HR sequences, as described by Santiago et al. [22].
MRI findings were reported as CR, nCR or PR. Image interpretation with scoring was done before surgery and/or endoscopy, so the radiologists were blinded to pathological findings.

Definition of Response and Reference Standards
Tumor response was evaluated with restaging MRI and correlated with the pathological reports from surgery (total mesorectal excision-TME)-the reference standard for all operated patients. MRI was considered "true positive" for CR if the pathologist confirmed it (CR: ymrTRG1; pTRG0). MRI was "true positive" for nCR if the histopathology analysis confirmed the MRI report (nCR: ymrTRG2; pTRG1)- Table 3. Cases were counted as "true negative" if both radiologist and pathologist reported residual tumor (PR: ymrTRG2,3; pTRG2).
Endoscopy, local recurrence-free follow-up (for at least 6 months) and biopsy (when available) were the reference standard for patients managed non-operatively. Endoscopy was suggestive of CR if a flat white scar with telangiectasia was seen at the tumor site, without ulcer or nodularity. If minor mucosal abnormality or shallow ulcer/red scar were present, endoscopy was indicative of nCR [14,23]. MRI was considered "true positive" for CR if endoscopy confirmed complete response (endoscopy CR) and "true positive" for nCR if endoscopy confirmed near-complete response (endoscopy nCR). "True negative" cases were counted if MRI and endoscopy found residual tumor (endoscopy PR). "True positive" cases for CR and nCR had no involved nodes on MRI, based on the criteria from the guidelines for nodal restaging [10] (p. 1470).

Histopathology Analysis
Surgically resected specimens from TME were thoroughly sectioned, with careful examination of the tumor site, and evaluated according to the guidelines of the American Joint Committee on Cancer, using the TNM staging system. A modified Ryan scheme (Table 3) for scoring tumor regression grade (pTRG) after neoadjuvant therapy was used [24].

Statistical Analyses
Data were analyzed using MedCalc 20.026 (Ostend, Belgium: MedCalc Software Ltd.) and IBM SPSS Statistics version 26 (Armonk, NY, USA: IBM Corp.). Numerical variables were summarized using descriptive statistics: number and proportion for qualitative variables, mean and standard deviation or median (quartile 1; quartile 3) for continuous variables. Student t-test and independent samples median test (depending on its distribution) were used to compare numerical variables.
The diagnostic performance of restaging MRI in detecting CR or nCR was assessed by receiver operating characteristics (ROC) curves, area under the curve (AUC), Se, Sp, positive predictive value (PPV) and negative predictive value (NPV) for individual and combined sequences, as well as for mrTRG, separately for both Readers. AUCs were compared between MRI sequences and mrTRG in MedCalc using the methodology of DeLong et al. [25]. The accuracy rate was calculated as Se×response in the sample + Sp × (1 − response in the sample), separately for CR and nCR, for both Readers. Interobserver agreement was evaluated by Interrater agreement Cohen's Kappa coefficient. The correlation between SSS at restaging MRI and the type of response at endoscopy and/or surgery was assessed by Spearman correlation coefficients. A p-value < 0.05 was considered statistically significant.

Description of the Study Sample
During the above-mentioned period, 120 consecutive patients with locally advanced rectal cancer were assessed by pelvic MRI, before and after nCRT. After completion of nCRT, 45 patients had CR or nCR on either restaging MRI, pathological reports from surgery or endoscopy. Five patients were excluded from this analysis: two did not perform restaging MRI before surgery, one was missed due to non-oncologic death and two patients were not followed-up for at least six months in the "watch-and-wait" protocol. 40 patients fulfilled the inclusion criteria and were enrolled in this study (23 men and 17 women; mean age 58.8 years; age range 25-80 years). Three patients had a mucinous type of rectal adenocarcinoma. Low rectal tumors (located < 6 cm from the anal verge) were present in 45% of cases (Table 4).
28 patients (70%) underwent surgery, with a median time between the end of nCRT and restaging MRI of 6.5 weeks and between restaging MRI and surgery of 3.5 weeks. Among the operated patients, 12 (42.8%) had CR, 11 (39%) had nCR and 5 (17.8%) had PR, according to the pathological modified Ryan score ( Figure 1).  A patient with CR at restaging MRI, confirmed by the histopathological report from surgery, is presented in Figure 2. A patient with CR at restaging MRI, confirmed by the histopathological report from surgery, is presented in Figure 2. A patient with CR at restaging MRI, confirmed by the histopathological report from surgery, is presented in Figure 2.   0: linear high signal on the luminal side of the right lateral wall (e) also high on ADC (f), corresponding to the mucosa; (g,h) hematoxylin and eosin-stained 4-µ m cut slices through the tumor scar illustrating the intact mucosa, dense fibrosis in the submucosa, thickened muscularis propria and fibrosis in the perirectal fat. Figure 3 illustrates a patient with nCR at MRI and histopathology after total mesorectal excision. For 12 patients out of 40, the restaging MRI was correlated with endoscopy. Nine patients had CR (75%), one had nCR (8.3%) and two (16.6%) had residual tumor at endoscopy (Figure 1). Complete and near-complete responders entered the "watch-and-wait" protocol: six patients had sustained CR confirmed by DRE, MRI and endoscopy; one patient with initial CR at endoscopy had pathological nCR at six months surgery; two complete responders had tumor regrowth at six months MRI, confirmed by surgery; a patient with nCR at initial endoscopy became a sustained complete responder on the follow-up examinations.
A case with a complete response after nCRT, followed according to the "watch-andwait" protocol, is illustrated in Figure 4. protocol: six patients had sustained CR confirmed by DRE, MRI and endoscopy; one patient with initial CR at endoscopy had pathological nCR at six months surgery; two complete responders had tumor regrowth at six months MRI, confirmed by surgery; a patient with nCR at initial endoscopy became a sustained complete responder on the follow-up examinations.
A case with a complete response after nCRT, followed according to the "watch-andwait" protocol, is illustrated in Figure 4.

Tumor Assessment Results by Readers
When data were analyzed overall, irrespective of the gold standard available at the restaging timepoint after nCRT, Reader 1 correctly identified 26 cases (65%)-24 "true positive" for CR and nCR (16 patients had surgery, 8 confirmed by endoscopy) and 2 "true negative" cases. Reader 2 correctly identified 30 cases (75%)-27 "true positive" for CR and nCR (19 patients had surgery, 8 confirmed by endoscopy) and 3 "true negative" (Figure 1). N1c, EMVI and CRM were negative in all restaging MRI examinations for both Readers, confirmed for all patients who underwent surgery. Reader 1 described residual adenopathy in two cases, one of these was also found in Reader 2's report; the pathology report confirmed residual adenopathy (yN+).

Diagnostic Performance of MRI for Detecting Complete Response
Diagnostic performance was good for detecting CR. For Reader 1, the Sp was similar for all individual and combined sequences assessed (89.5%). Se, PPV, NPV, AUC and accuracy rates had the highest values for T2WI (85.7%, 90.0%, 85.0%, 0.876 and 87.5%, respectively). Similar AUCs were observed for DWI, T1C and for the combination of sequences analyzed (p-values for the difference between individual AUC were ≥0.05; Table 5; Figure 5).  For Reader 2, the highest values for Se and NPV were observed for DWI (95.2%, 92.9%). The combination of sequences-T2WI+DWI and T2WI+DWI+T1C-had the highest Sp, PPV, AUCs and accuracy rates (79.0%, 82.6%, 0.847 and 85.0%, respectively), followed by T2WI score (73.7%, 79.2%, 0.821 and 82.5%). DWI had the highest Se and NPV among the sequences assessed. No significant differences for AUCs were observed be- For Reader 2, the highest values for Se and NPV were observed for DWI (95.2%, 92.9%). The combination of sequences-T2WI+DWI and T2WI+DWI+T1C-had the highest Sp, PPV, AUCs and accuracy rates (79.0%, 82.6%, 0.847 and 85.0%, respectively), followed by T2WI score (73.7%, 79.2%, 0.821 and 82.5%). DWI had the highest Se and NPV among the sequences assessed. No significant differences for AUCs were observed between MRI sequences (p-value for the difference between individual AUC were > 0.05; Table 5, Figure 5).
The accuracy rates for mrTRG were 82.5% for Reader 1 and 85.0% for Reader 2, respectively. The Sp and PPV of SSS for detecting CR were 89.5% and 90.0% for Reader 1, and 73.7% and 79.2% for Reader 2. Accuracy rates were 87.5% and 82.5%, respectively.

Diagnostic Performance of MRI for Detecting near-Complete Response
Overall, the AUCs for both Readers were lower than those for detecting CR. For Reader 1, the highest Sp, PPV and accuracy rate were observed for T2WI (78.6%, 62.5% and 80.0%, respectively). AUC was significantly higher for T2WI than T1C, T2WI+DWI and T2WI+DWI+T1C (p-value for this comparison <0.05) (Table 6, Figure 6).   For Reader 2, DWI had the highest Sp, PPV and accuracy rate (92.9%, 77.8% and 82.5%, respectively), while Se was the lowest for this sequence. No significant differences for AUCs were observed between MRI sequences (p-values for the comparisons between individual AUCs >0.05 (Table 6, Figure 6).
The accuracy rates for mrTRG were 70.0% for Reader 1 and 80.0% for Reader 2, respectively. For Reader 2, DWI had the highest Sp, PPV and accuracy rate (92.9%, 77.8% and 82.5%, respectively), while Se was the lowest for this sequence. No significant differences for AUCs were observed between MRI sequences (p-values for the comparisons between individual AUCs >0.05 (Table 6, Figure 6).
The accuracy rates for mrTRG were 70.0% for Reader 1 and 80.0% for Reader 2, respectively.

Interobserver Agreement
Interobserver agreement was strong for SSS (k = 0.800), moderate for T2WI and weak for the combination of sequences. For identifying CR or nCR, the Kappa Cohen coefficients of agreement varied for different sequences assessed, ranging from 0.368-for DWI score, to 0.605-for T2WI score (Table 7). Table 7. Interobserver agreement for restaging MRI in LARC after nCRT.

Interobserver Agreement
Interobserver agreement was strong for SSS (k = 0.800), moderate for T2WI and weak for the combination of sequences. For identifying CR or nCR, the Kappa Cohen coefficients of agreement varied for different sequences assessed, ranging from 0.368-for DWI score, to 0.605-for T2WI score (Table 7).

Discussion
The focus of this study was to evaluate the performance of MRI sequences in diagnosing CR and nCR of LARC after nCRT, with surgery or endoscopy as the reference standard.
The morphologic high-resolution T2WI sequences in three planes-sagittal and angulated perpendicular and parallel to the rectal tumor axis-were the mainstay of the protocol in assessing response. Scoring T2WI sequences relied on the visual appreciation of the relative proportion of low signal (corresponding to fibrosis) and intermediate signal intensity (residual tumor) within the treated neoplasia.
We obtained high Se and Sp for both Readers when using T2WI score to identify CR, suggesting good diagnostic accuracy of this sequence. For detecting nCR, both Readers obtained good Sp, with good Se for Reader 1 and moderate Se for Reader 2 when using T2WI sequence. Similar to our results, recent studies from Nahas et al. (2019) and Ko et al. (2019) reported good Se (61% and 70.6%) and high Sp (89.6% and 95.3%) of T2WI for diagnosing a CR [26,27]. Nevertheless, a meta-analysis from 2020 including 17 studies showed a large heterogeneity of data, with a mild inverse relationship between Se and Sp of T2WI for diagnosing CR, the summary Se and Sp being 49% and 86%, respectively [28]. An explanation might be the various criteria used by different authors to diagnose CR on T2WI. Treatment-induced fibrosis and edema that mimic residual tumor also influenced the false negative rate, explaining the heterogeneous data reported. For the diagnosis of a nCR, a systematic review from van der Paardt et al. also reported good Sp and moderate Se, with mean values of 89.8% and 55.3% [29], similar to the ones we obtained in the current analysis.
Interestingly, in our study, we obtained a very good Sp (89.5% Reader 1/73.7% Reader 2) and PPV (90% Reader 1/79.2% Reader 2) for predicting a CR, a high Se (85.7% Reader 1/90.5% Reader 2) in not missing complete responders and a strong level of agreement between Readers in the detection of CR using the SSS, with a Kappa Cohen coefficient of 0.800 (p < 0.001). The SSS has been described by Santiago et al. [22] as a characteristic morphologic pattern of the tumor scar, seen on high resolution T2WI sequence, having high Sp and PPV for identifying a CR. A positive SSS consists of an organized layered hypointense fibrosis at the tumor bed, which includes, besides mrTRG1 endoluminal scar, a perirectal layer of fibrosis separated by an intermediate signal intensity thickened, partially fibrotic muscularis propria [22,30]. The authors found that a positive sign at the restaging MRI had a very high Sp (97%) and PPV (93/94%) for a sustained CR, with a substantial interobserver agreement (k = 0.69; p < 0.01).
The diagnostic performance for identifying a CR was also good for both Readers when using DWI, with an AUC of 0.828 for Reader 1 and 0.818 for Reader 2. Nevertheless, a minimal level of agreement between Readers was achieved for the DWI score, with variability in interpreting qualitatively the presence or absence of hyperintense foci on high b-value at the site of the primary tumor, as shown by the values of Se and Sp of the two Readers. The lack of anatomical details and susceptibility artifacts on DWI can lead to interpretation errors and variability between Readers, as other authors previously reported [31]. Adding DWI to T2WI improved the diagnostic performance of MRI for Reader 2 in identifying the CR, but not for Reader 1, and improved the agreement among the Readers. Similar to our results for Reader 2, other studies reported AUCs between 0.77 and 0.85 when using a combined T2WI and DWI score for diagnosing CR [32] and better performance of these combined sequences with an improvement in diagnostic accuracy as compared to T2WI alone [28,33]. On the other hand, a meta-analysis of 14 studies from Wu et al. showed a non-significant increase in sensitivity for the prediction of tumor response when adding DWI to T2WI [34].
Diagnostic performance was weaker when using DWI and T2WI+DWI scores for detecting nCR than those obtained for CR, with lower AUCs for both Readers. No statistically significant differences between T2WI, DWI and T2WI+DWI AUCs were observed for Reader 2. Variable data were found in previous studies on DWI performance in detecting nCR. The variability in the diagnostic performance of this sequence is mainly due to the patterns of response to nCRT-fragmentation or shrinkage-and to cellular heterogeneity of the treated tumor, being inexact in differentiating between mere fibrosis, fibrosis containing vital tumor cells and small residual tumor [35].
Adding the T1C sequence to the standard MRI protocol (T2WI and DWI) showed no improvement in diagnostic accuracy for both Readers in identifying CR and nCR. These results are in accordance with previous studies from the literature that found no added value for qualitative assessment of DCE in selecting good responders [36].
MRI TRG is an imaging counterpart of TRG systems used in histopathology. Incorporating all sequence findings and using the highest score between sequences, we created a modified mrTRG system to grade fibrotic response. We compared mrTRG with pTRG and/or endoscopy. Our modified mrTRG had a good Sp and NPV, for both Readers, in diagnosing complete and near-complete responders. AUCs varied between 0.828-0.847 for CR and 0.690-0.762 for nCR. However, a weak interobserver agreement with a Kappa Cohen coefficient of 0.544 was observed. Other studies reported similar or higher Sp (88.7-94.2%) with lower sensitivity (16.7-33%) for complete responders. Se improved (56-60%) when CR and nCR were combined in selecting good responders [37]. The agreement between mrTRG graded on T2WI and pTRG is low in other literature published data, with limited performance to detect CR (Se 74%, Sp 63%) [20].
We acknowledge several limitations of our study. First, the number of patients is relatively small, being divided into patients who underwent surgery and a smaller subgroup of watch-and-wait patients. The time intervals, end of nCRT-restaging MRI and MRI-surgery, varied across the included patients and this might have influenced the results.
The follow-up period was between 6 and 24 months for sustained complete responders so that there were complete responders with a short follow-up; this is important because local regrowth is generally reported to occur within the first 18 to 24 months [38]. We included in the study two patients with short-course radiotherapy with consolidation chemotherapy, who had no surgery and were followed according to the "watch-and-wait" protocol; pathological CR rates are known to be lower with this type of treatment, compared to long-course chemoradiation [39]. The use of a microenema prior to MRI could not be applied to all cases, and this could explain the potential pitfalls of the DWI sequence, caused by susceptibility artifacts. The MRI reports were based on the visual assessment of sequences and not on quantitative techniques, but this is similar to how a radiologist interprets cases in daily clinical practice. Our algorithm to combine T2WI, DWI and T1C data used the highest score for the overall assessment, and this might have influenced our results.

Conclusions
Restaging MRI of LARC after nCRT had good diagnostic performance in identifying complete and near-complete responders. Using the SSS, both Readers obtained high Sp and PPV in diagnosing CR, with a strong level of inter-observer agreement. Dynamic contrastenhanced MRI added to the standard protocol did not improve diagnostic accuracy. T2WI performed better than other individual sequences in diagnosing CR or nCR and the use of T2WI in three planes along with DWI was the optimal combination of sequences for selecting good responders. However, this is a single-center study including a small sample size; its results cannot be generalized and further studies with larger samples are needed to confirm our data.  Informed Consent Statement: Written informed consent was obtained from all patients prior to any study procedures. The study was performed in agreement with The Code of Ethics of the World Medical Association (Helsinki Declaration) for experiments involving human subjects.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available as written consent was not obtained from study participants for this.

Conflicts of Interest:
None in relation with the development of this manuscript.