Intra- and Interobserver Variability in Magnetic Resonance Imaging Measurements in Rectal Cancer Patients

Simple Summary Colorectal cancer is the second most common cancer and was the second most common cause of cancer-related death in Europe in 2018. Accurate lymph node staging in primary rectal cancer staging is essential for the selection of the proper treatment regimen. In 2018, The European Society of Gastrointestinal and Abdominal Radiology published consensus recommendations for primary rectal cancer staging, and suggested that lymph nodes be assessed by size, morphology, and location in- or outside the mesorectal fascia. Our study aimed to assess the inter- and intraobserver variability in size, apparent diffusion coefficient measurements, and morphological characterization among inexperienced and experienced radiologists. Our data indicate that subjective variables like morphological characteristics are less reproducible than numerical variables, regardless of the level of experience of the observers. Abstract Colorectal cancer is the second most common cancer in Europe, and accurate lymph node staging in rectal cancer patients is essential for the selection of their treatment. MRI lymph node staging is complex, and few studies have been published regarding its reproducibility. This study assesses the inter- and intraobserver variability in lymph node size, apparent diffusion coefficient (ADC) measurements, and morphological characterization among inexperienced and experienced radiologists. Four radiologists with different levels of experience in MRI rectal cancer staging analyzed 36 MRI scans of 36 patients with rectal adenocarcinoma. Inter- and intraobserver variation was calculated using interclass correlation coefficients and Cohens-kappa statistics, respectively. Inter- and intraobserver agreement for the length and width measurements was good to excellent, and for that of ADC it was fair to good. Interobserver agreement for the assessment of irregular border was moderate, heterogeneous signal was fair, round shape was fair to moderate, and extramesorectal lymph node location was moderate to almost perfect. Intraobserver agreement for the assessment of irregular border was fair to substantial, heterogeneous signal was fair to moderate, round shape was fair to moderate, and extramesorectal lymph node location was substantial to almost perfect. Our data indicate that subjective variables such as morphological characteristics are less reproducible than numerical variables, regardless of the level of experience of the observers.


Introduction
Colorectal cancer is the second most common cancer, with 500,000 new cases in Europe in 2018, and was the second most common cause of cancer-related death with 243,000 deaths in Europe in the same year [1]. Rectal cancer accounts for 27-58% of all colorectal cancer cases [2]. Accurate lymph node (LN) staging in rectal cancer patients is essential for the selection of the proper treatment regimen. Hence, LN involvement is an independent prognostic factor predicting overall survival and local recurrence [3]. Historically, LNs were assessed using size criteria alone. Brown et al. concluded that the prediction of LN involvement in rectal cancer with magnetic resonance imaging (MRI) is improved by using morphologic characteristics instead of size criteria [4].
In contrast, Gröne et al. [5] found no improvement in the accuracy of LN staging by using morphological criteria. The European Society of Gastrointestinal and Abdominal Radiology (ESGAR) published consensus recommendations for the primary staging and restaging of rectal cancer using MRI. The assessment of size, morphology, and mesorectal/extramesorectal LN involvement forms the cornerstone of LN staging [6]. Brown et al. showed substantial interobserver agreement by dividing LN into "involved" and "noninvolved" groups with a kappa value (κ) of 0.71 among radiologists with 5 to 10 years of experience in MRI [4].
It is unclear whether the recommended assessment criteria for LN staging presented by ESGAR are reproducible among radiologists with different experience levels. Although apparent diffusion coefficient (ADC) measurement has no crucial role in primary rectal staging, recent studies suggest that the ADC measurement could improve the diagnostic accuracy of LN staging [7][8][9]. To our knowledge, no inter-and intraobserver variability study of lymph node size, ADC, and morphological characteristics has been published.
Therefore, this study aims to assess the inter-and intraobserver variability of size, morphology, mesorectal/extramesorectal LN involvement, and potentially beneficial LN characteristics (e.g., ADC measurements) among inexperienced and experienced radiologists.

Patients and MRI
A total of 155 patients with rectal cancer from the Department of Surgery, Vejle Hospital, Denmark underwent an MRI of the rectum between 1 January and 31 December 2018, and were considered for inclusion in this intra-and interobserver variation study of lymph node staging in locally advanced rectal cancer. Inclusion criteria consisted of (1) biopsy-proven rectal adenocarcinoma and (2) locally advanced disease with positive LN stage defined by primary T2-weighted magnetic resonance imaging (T2W-MRI) and diffusion-weighted magnetic resonance imaging (DWI-MRI) conducted either on a 1.5 Tesla or 3 Tesla MRI scanner (Philips Medical Systems, Best, the Netherlands) using the same scanning protocol.
After localizer scans, fast T2-weighted (T2W) spin-echo sequences were obtained. The scans, which included 3 mm axial slices at a 90 • angle to the tumor axis, were prepared by the MRI radiographer assisted by a radiologist to ensure perpendicular images. No contrast enhancement was used. DWI was performed perpendicular to the tumor using an echo-planar imaging (EPI) factor of 61. Five different b values (strength and timing of the gradients to generate DWI) were used by applying diffusion-sensitive gradients: b = 0, b = 200, b = 400, b = 600, and b = 800 s/mm 2 . The first series was a set of image sequences formed by echo-planar spin-echo T2W imaging (b = 0). The next series formed gradients in the x, y, and z directions and formed isotropic images which were obtained by calculating diffusion vector projections of the three directions. ADC maps of the isotropic images were created automatically by the Philips Ingenia software. Patients were scanned in the supine position. Bowel cleansing was not performed, and no oral or rectal contrast media were administered. A total of 36 patients were eligible and included in the study. The remaining 119 patients were excluded due to negative lymph node stage, previous surgical intervention, benign tumor, or relapse of primary cancer.
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board (May 2020) of the University Hospital of Southern Denmark (Journal number: 20/25129) and the local Danish Data Protection Agency. An additional informed consent was not required in the present retrospective study since no additional diagnostic information was generated from the 36 primary MRI staging scans from 2018.

Lesion Selection
One to four lymph nodes per patient were selected at random by SRR (observer 4). A total of 104 lymph nodes were marked with image numbers and arrowheads and were plotted into a blinded lymph node assessment form. To minimize observer bias, the observer who selected the lymph nodes read the cases more than four weeks after selection.

Lymph Node Assessment
All observer characteristics are presented in Table 1. A three-monitor workstation setup was used with one allocated radiological information system (RIS) (Carestream Health, Inc.), with an 18.5" display (Lenovo, China) for patient selection, and two allocated picture archiving and communication systems (PACSs) (Medical Insight, Valby, Denmark) with displays (21.3" Monitor CCL358i2 from: Totoku, JVCENWOOD Corporation, Kanagawa, Japan) for picture evaluation. Each observer assessed the lymph nodes independently and reported on a pre-printed lymph node assessment form with one patient ID and one to four lymph nodes in random order. The selected MRI images were input into a separate research file within the RIS. No additional clinical information beyond the presence of histopathologically proven rectum cancer was provided. There were three numerical variables: length and width in millimeters, apparent diffusion coefficient (ADC) in mm 2 /s, and four binary non-numerical variables (i.e., round shape, irregular border, heterogeneous signal, and extramesorectal lymph node). Observers were encouraged to use electronic calipers for measurements. T2W-MRI images with 3 mm slice thickness were used to measure length and width and assess the binary non-numerical variables. The ADC map was used for ADC measurements. A second read of the MRI images was performed three months after the first read, and the MRI images were presented in random order.

Statistical Analysis
Intra-and interobserver agreement of the numerical variables was estimated by the interclass correlation coefficient (ICC). A two-way random-effects absolute agreement model was used to estimate the interobserver ICC and corresponding 95% confidence intervals (CIs) of the first and second reads. A similar two-way random-effects absolute agreement model was used for intraobserver agreement to estimate the ICC and corresponding 95% CI for each observer. All models were performed separately for length, width, and ADC measurements. ICC values were interpreted using the following cut-offs: below 0.50, poor; between 0.50 and 0.75, fair; between 0.75 and 0.90, good; above 0.90, excellent [10].
Bland-Altman plots were produced for length, width, and ADC measurements, plotting the mean of the two reads against the difference and limits of agreement.
Intraobserver agreement of the non-numerical variables from the first and second reads was calculated using Cohens-kappa (κ). The observer (observer 3) with the highest kappa value (irregular border: 0.69; heterogeneous signal: 0.53; round shape: 0.56; extramesorectal lymph node: 0.95) was used as the "gold standard" when estimating pairwise interobserver agreement of the non-numerical variables at the first and second reads. All calculations were performed separately for irregular border, heterogeneous signal, round shape, and extramesorectal lymph node.

Numerical Variables
Interobserver agreement of the length measurements for the first read was excellent, and it was good for the second read, with ICCs of 0.94 for both reads. The intraobserver agreement was excellent, with ICCs of 0.94-0.98. For width measurements, the interobserver agreement of the first read was excellent, with an ICC of 0.93, and good for the second read (0.89). The intraobserver agreement was excellent for observers 2 and 3 with ICCs ranging 0.95-0.96, and for observers 1 and 4 the agreement was good, with ICCs ranging 0.89-0.92. For ADC measurements, the interobserver ICCs for the first and second reads were fair, ranging 0.73-0.79. The intraobserver agreement was good for observers 1, 2, and 3, with ICCs ranging 0.84-0.89; it was fair for observer 4, with an ICC of 0.79 (Table 2). Intraobserver agreement on the numerical variables showed narrow limits of agreement with the mean towards zero in the Bland-Altman plots ( Figure 1).  Table 3 shows kappa values (κ) for inter-and intraobserver variability in assessing irregular border, heterogeneous signal, round shape, and extramesorectal LN location. Figures 2-4 show MRI images of characteristic LNs. Interobserver agreement of irregular border assessment was fair to substantial, the heterogeneous signal assessment was slight to moderate, and the round shape assessment was slight to substantial. Extra mesorectal LN location assessments were fair to almost perfect. Intraobserver agreement of irregular border assessment ranged from fair to almost perfect, heterogeneous signal assessment  Table 3 shows kappa values (κ) for inter-and intraobserver variability in assessing irregular border, heterogeneous signal, round shape, and extramesorectal LN location. Figures 2-4 show MRI images of characteristic LNs. Interobserver agreement of irregular border assessment was fair to substantial, the heterogeneous signal assessment was slight to moderate, and the round shape assessment was slight to substantial. Extra mesorectal LN location assessments were fair to almost perfect. Intraobserver agreement of irregular border assessment ranged from fair to almost perfect, heterogeneous signal assessment was fair to substantial, round shape assessment was fair to substantial, and agreement in lateral/extramesorectal LN location ranged from moderate to almost perfect. LNs, lymph nodes; * statistical analysis was performed on complete datasets. The number of LNs varies because of the failure to report on the LN assessment form.

Discussion
The current study shows that the assessment of LN size and ADC measurements with MRI was highly reproducible, regardless of the level of observer experience, in patients with positive LNs. On the other hand, the morphological assessment of LNs with MRI showed lower reproducibility, except for extramesorectal LN location with high observer agreement.
The preoperative identification of patients with LN-negative disease with a good prognosis is important. It helps to select the patients who are likely to have a better outcome with surgical intervention alone [3]. The workshop training of radiologists could possibly improve the LN assessment by avoiding the classification of small nodes with an oval, homogeneous, and regular boundary as LN-positive disease and thereby resulting in less preoperative overtreatment.
An MRI of the pelvic region is the standard for rectal cancer staging. ESGAR's consensus recommendations on the assessment of LN involvement in primary rectal cancer staging by MRI fundamentally depends on size assessment, morphology, and location characteristics [6,12].
Kono et al. [13] published a node-for-node comparative study of specimen and histology and found rectal cancer metastases in LN as small as 1 mm in diameter. They concluded that size criteria alone cannot accurately predict the LN involvement of rectal cancer.
Previous studies have shown mixed results regarding the benefits of using morphological criteria in addition to size. Kim et al. [14] performed a single-observer study with one experienced radiologist. They concluded that the presence of individual criteria like spiculated or indistinct border, or heterogenic signal of the LNs, in addition to LN size could be helpful to predict LN involvement. Brown et al. [4] performed an interobserver study with two experienced radiologists who had at least five years of experience in the MRI staging of rectal cancer. They showed that the predictive value of LN size criteria alone was poor because of substantial overlap between benign and malignant LNs. However, diagnostic accuracy improved through the use of morphological criteria such as the assessment of border and signal intensity. Furthermore, they showed good reproducibility by categorizing LNs into "involved" and "non-involved" groups, with LNs regarded as positive if either an irregular border or a mixed-signal intensity was present. Gröne et al. [5] conducted a retrospective single-observer study in which one radiologist with over 20 years of experience in the MRI staging of rectal cancer analyzed previously obtained MRI images of patients with histopathologically verified rectum cancer. They found no significant improvement in the diagnostic accuracy between size criteria alone and a combination of size and morphological criteria.
All inter-and intraobserver agreements within numerical variables were reasonably high, with good to excellent agreement for length and width measurements and fair to good agreements for ADC; all CIs were overlapping, indicating no significant difference in measurements obtained by inexperienced and experienced observers. Our high reproducibility of ADC is in accordance with the findings of Kwee et al. [9]. However, no unambiguous results have been published showing whether ADC measurements can be used for LN staging in rectal cancer patients. Heijnen et al. [8] showed that DWI-MRI and ADC measurements could improve LN visualization in primary rectal cancer staging, but the diagnostic accuracy was not improved. Nevertheless, interobserver agreement was excellent for ADC measurements. Lambregts et al. [7] found that ADC measurements on MRI scans in patients with rectal cancer after chemoradiation may improve LN characterization. However, this can be difficult in small LNs, as 28% of malignant LNs in rectal cancer are less than 3 mm in size [15].
Our data regarding LN location within or without the mesorectal fascia show reasonably high reproducibility among all observers. In correlation with the study of Kim et al. [16], which showed extramesorectal LN involvement to be a major factor for locoregional recurrence, our data indicate a good intra-and interobserver agreement for the presence of extramesorectal LNs. This is important since the lateral extramesorectal LNs are not removed by standard total mesorectal excision (TME), and even neoadjuvant chemoradiotherapy with TME is not sufficient to prevent lateral local recurrence in enlarged nodes [17,18]. Some consider lateral nodal disease to represent metastatic disease that is not amenable to treatment, and persistently enlarged lateral nodes after chemoradiotherapy indicate a high risk of local recurrence [19]. Lateral lymph node dissection may improve locoregional control in patients with low rectal cancer and abnormal lateral LNs, but larger studies are warranted [20].
MRI LN staging in rectal cancer patients remains complex, and more research in the field is needed to improve the diagnostic accuracy. A recent study by Ding et al. suggested that artificial intelligence (AI) might add diagnostic accuracy in the evaluation of metastatic LNs in patients with rectal cancer [21]. Other methods are also being tested, and there is a need for further improvement in the LN staging of rectal cancer [22][23][24][25][26][27][28][29][30][31].
Colorectal cancers with deficient mismatch repair (dMMR) display a greater inflammatory response and local infiltration of lymphocytes in tumor and peritumoral tissue; dMMR can arise during DNA replication. The distinct clinicopathological features of dMMR colorectal cancer affect the accuracy of preoperative N staging [32,33]. Exactly how this affects the interobserver variation remains unclear.
A limitation of our study is the absence of an absolute reference for the morphological characteristics of the LNs. All MRI scans assessed were of patients with histopathologically verified rectal adenocarcinoma. Since no histopathological node-by-node LN stage was correlated with our findings, pairwise kappa values were calculated for interobserver agreement using the most consistent observer (observer 3). This approach seemed satisfactory for the purpose of interobserver variation for morphological criteria in LN assessment. However, it did not provide any clinical value for the interpretation of whether an LN is benign or malignant. The prospective node-by-node approach used by Brown et al. [4] would have been preferable.
Although our patient population of 36 patients was relatively small and the LNs were not correlated directly with histopathology, we found that the assessment of 104 LNs contributed valuable data regarding the reproducibility of the traditionally used LN assessment characteristics of length, width, and morphology, as well as newer, potentially beneficial factors like ADC among radiologists of different experience levels.
All observers assessed the LNs on similar PACS stations to minimize variances in image quality. Since the use of electronic magnification was neither recommended nor prohibited at the beginning of this study, no data regarding the use of electronic magnification was recorded. The use of electronic magnification might have affected the measurement accuracy, predominantly on the small lymph nodes.

Conclusions
Our data indicate that MRI numerical LN variables are reproducible regardless of the level of experience of the observers, whereas subjective variables like morphological characteristics are less reproducible.
Prospective node-by-node validation studies are warranted to investigate whether quantitative ADC measurements can improve LN staging by imaging in rectal cancer. Informed Consent Statement: Additional informed consent was not required in the present retrospective study since no additional diagnostic information was generated from the 36 primary MRI staging scans from 2018. Data Availability Statement: Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.