Quantitative MRI of Pancreatic Cystic Lesions: A New Diagnostic Approach

The commonly used magnetic resonance (MRI) criteria can be insufficient for discriminating mucinous from non-mucinous pancreatic cystic lesions (PCLs). The histological differences between PCLs’ fluid composition may be reflected in MRI images, but cannot be assessed by visual evaluation alone. We investigate whether additional MRI quantitative parameters such as signal intensity measurements (SIMs) and radiomics texture analysis (TA) can aid the differentiation between mucinous and non-mucinous PCLs. Fifty-nine PCLs (mucinous, n = 24; non-mucinous, n = 35) are retrospectively included. The SIMs were performed by two radiologists on T2 and diffusion-weighted images (T2WI and DWI) and apparent diffusion coefficient (ADC) maps. A total of 550 radiomic features were extracted from the T2WI and ADC maps of every lesion. The SIMs and TA features were compared between entities using univariate, receiver-operating, and multivariate analysis. The SIM analysis showed no statistically significant differences between the two groups (p = 0.69, 0.21–0.43, and 0.98 for T2, DWI, and ADC, respectively). Mucinous and non-mucinous PLCs were successfully discriminated by both T2-based (83.2–100% sensitivity and 69.3–96.2% specificity) and ADC-based (40–85% sensitivity and 60–96.67% specificity) radiomic features. SIMs cannot reliably discriminate between PCLs. Radiomics have the potential to augment the common MRI diagnosis of PLCs by providing quantitative and reproducible imaging features, but validation is required by further studies.

Diffusion-weighted imaging (DWI) is an MRI technique that reflects the Brownian motions of the water in tissues. This motion degree can be assessed qualitatively and quantitatively by the apparent diffusion coefficient (ADC) maps [31] and is widely available in most clinical practices. The role of ADC measurements in differentiating between mucinous and serous PLCs has been investigated before, with variable and often contradicting results [32][33][34][35][36]. It would have been expected for mucinous tumors to display lower ADC values due to the dense mucin content, but very often, a higher degree of restriction was observed in serous-containing lesions [32][33][34][35][36][37]. Similarly, the same density difference would be expected to produce alterations in the T2 signal intensities (SIs) of PCLs, but, to the best of our knowledge, the role of quantitative T2 SIs in differentiating between mucinous and serous PLCs has never been investigated before.
We hypothesized that mucinous pancreatic cysts would have distinct MRI signal characteristics compared to non-mucinous pancreatic cysts (due to their very different content), but these differences cannot be observed and classified by the human eye. Therefore, we proposed two methods of quantitative signal evaluation based on the signal intensity measurements (SIMs) of PLCs on ADC maps and T2 and DWI sequences, and the radiomics/texture analysis on the PLCs based on T2WI and ADC maps.

Study Group
This Health Insurance Portability and Accountability Act-compliant, single-institution, retrospective pilot study was approved by the institutional review board, and informed consent was waived due to the retrospective nature of this research. We aimed to include patients with PLCs who underwent MRI examinations in our institution from August 2017 to April 2019, with the possibility of future histopathological confirmation when biopsy/surgery was suggested, and/or subsequent imaging examinations every 6 months for at least 12 months in cases of typically benign lesions that were not biopsied or operated on. The patient selection was performed by one researcher (CC, a radiologist specialized in abdominal MRI) who was aware of both the MRI images and the patients' clinical data. The researcher was not involved in subsequent quantitative image analysis workflow to reduce the possibility of potential bias.
Firstly, a keyword search (using the terms "pancreatic + cyst/cystic/cystic lesions", alternative, and combined) was conducted in the imaging database of our institution to identify MRI examinations corresponding to PCLs. The search resulted in 181 image reports. Each report was analyzed and the researcher excluded the ones that were not referring to PCLs (n = 28) or when the PCLs reports mentioned a lesion diameter of less than 10 mm (n = 23). Secondly, the medical records of the patients with the remaining 130 examinations were retrieved from the archive of our healthcare institution and investigated for diseaserelated data. Patients that were transferred to another institution for follow-up or treatment (n = 4) were excluded. Patients without a final diagnosis (established by clinical/imaging data or histopathological analysis) were eliminated (n = 49). When patients had multiple MRI examinations, the "reference" examination was considered the one performed before the surgery/fine needle aspiration (FNA) (for pathologically confirmed lesions), and the most recent study for patients that underwent imaging and clinical follow-up. The duplicates were then eliminated (n = 9). Afterward, the MRI examinations were reevaluated. When multiple lesions were detected in the same patient, the researcher only marked the ones that had a final diagnosis. In addition, all lesions were re-measured and the ones that did not meet the size criterion were excluded from further evaluation (n = 2). All examinations in which the T2-weighted sequences (T2WI), the DWI, or the ADC maps were affected by artifacts were excluded from the further investigation (n = 12). Finally, 59 lesions from 54 patients were included in the study group.

Image Acquisition
All MRI examinations were performed on the same machine (General Electric Optima 360MR Advance system, Waukesha, WI, USA; 1.5 Tesla). Dedicated array coils were used to cover the abdominal and pelvic regions. The imaging protocol varied because the examinations were selected from a range of 3 years, but each examination included a T2 single-shot fast spin-echo (T2 SS-FSE) and DWI sequences, which were the only sequences used for this study. Axial DWI acquisitions were synchronized with respiratory movements and computed for the same three b-values (50, 400, and 800 s/mm 2 ). The DWI sequences were acquired using the same slice interval and thickness as well as location as the ones used for the standard axial sequences. The DWI parameters were: repetition time (TR), 10,000 ms; echo time (TE), 64 ms; slice thickness, 6 mm; interval, 1 mm and acquisition matrix, 128 × 128. The ADC and Exponential Apparent Diffusion Coefficient (eADC) functional maps were automatically obtained on the scanner computer, using echo-planar imaging (EPI) correction using the following parameters: confidence level, 0.9; lower threshold, 20, and kernel size, 2. The acquisition parameters for the T2 SS-FSE sequence were: TR, 1100 ms; TE 95 ms; slice thickness, 4 mm; slice spacing, 1 mm; and acquisition matrix, 256 × 152.

Image Interpretation
Each examination was reviewed by the same radiologist (CC). When different types of pancreatic cystic lesions were observed within the same organ, the images were crossreferenced with the pathological and ultrasonography results, and other medical data to ensure the selection of lesions that were previously documented. Selected lesions were marked. The researcher chose one slice on the T2-weighted sequence where the lesions' fluid content was better displayed. The chosen T2-weighted sliced was synchronized with the DWI sequences and the ADC maps, and one image from every mentioned sequence was retrieved and used for subsequent analysis, after being anonymized.

Signal Intensity Measurements
The previously selected images were imported on a dedicated workstation (General Electric, Advantage workstation, 4.7 edition, Boston, MA, USA). The quantitative SIMs were performed in consensus by two observers (IO and AL). Both researchers were blinded to patients' clinical history, laboratory data, and histopathology characteristics. The signal intensity values on the T2WI, the three DWI sequences (b = 50, b400, 1000 mm 2 /s), and the ADC maps were measured by placing an elliptical region of interest (ROI). The observers carefully placed the ROI within the cystic lesions, respecting the cystic walls, intra-lesional debris, or solid components. Moreover, the ROIs were drawn to avoid vascular motion and abdominal wall artifacts, as well as visible vascular and biliary structures. The minimum diameter of every ROI was set to at least 0.2 cm 2 and was placed approximately in the same location on each sequence, using synchronized slices. Each researcher performed one set of measurements. The values were averaged and used for subsequent statistical analysis. A univariate analysis test (the Mann-Whitney U test) was conducted to compare average SIMs between the groups. A synthetic reproduction of the ROI positioning process is depicted in Figure 1. Further, the coefficient of variation (COV) from duplicate measurements was calculated using the logarithmic method to determine the reproducibility of the measurements conducted by the two radiologists, thus estimating the within-run imprecision [38]. The intraclass coefficient (Kappa, ICC) was also determined from the same data.

Radiomics Workflow
The radiomics approach consisted of five steps: image pre-processing, lesion segmentation, feature extraction, feature selection, and prediction.

Image Pre-Processing and Segmentation
The same T2WI and ADC maps used for the SIMs were imported into a dedicated software for texture analysis, QMaZda (MaZda, Institute of Electronics, Technical University of Lodz, Lodz, Poland). Subsequently, the imported image's gray levels were normalized based on the mean and three standard deviations of gray level intensities to reduce the contrast and brightness variations (which could affect the true image textures) The image segmentation process was performed by a second researcher (RAL) who was blinded to the outcomes of the patients and was also not involved in the SIM process. On T2WIs, the researcher incorporated each lesion into a two-dimensional ROI. The first step of the ROI definition process was performed semi-automatically. The researcher placed a seed inside each cyst and the software automatically delineated the structure of interest, based on gradient and geometry coordinates. In the second step, if a complete overlap between the ROI and the structure's contours was detected, the ROI was manually adjusted. Afterward, the defined ROI was transferred to the ADC maps. Manual adjustments were performed when an uncomplete overlapping was observed ( Figure 2).

Feature Extraction
After ROI delineation, the software automatically extracted the texture features (texture parameters) using preset computation methods (Table 1). From each lesion, a total of 550 parameters were extracted (275 parameters from T2WI and 275 parameters from ADC maps) and used for subsequent analysis.

Feature Selection
The feature selection workflow was identical for the parameters extracted from the T2WIs as well as the ones computed from the ADC maps. Firstly, two predefined reduction techniques were applied to highlight the parameters with the highest discriminatory potential: the probability of classification error and average correlation coefficients (POE + ACC) and Fisher coefficients (F, the ratio of between-class to within-class variance), each of them providing a set of 10 texture features. The Fisher algorithm selected features with maximized differences between two groups, while the POE + ACC algorithm introduced features with high discriminatory potential and the least correlation with features that were already selected. Afterward, the absolute values of the highlighted parameters were compared between MNPCs and nMNPCs by performing a univariate analysis test (Mann-Whitney U). The statistical significance level was set at a p-value of below 0.05. To evaluate the reproducibility and stability of the selected texture feature sets, 24 patients were randomly selected for a double-blinded comparison and the same radiologist redefined each ROI, approximately two weeks after the initial process. Features with an intraclass correlation coefficient (ICC) lower than 0.85 were excluded from further analysis.

Class Prediction
We investigated which of the remaining parameters could function as independent predictors for mucinous lesions (on T2WI and ADC maps, respectively). In this regard, a multiple regression analysis (using the "enter" input model) was conducted, with the computation of the variance inflation factor (VIF) and the coefficient of determination (R-squared). The "enter" input model included all variables that showed a p-value of below 0.05 and removed all variables that showed a p-value of more than 0.01. Features that showed a VIF of greater than 10 4 were removed from further analysis, since a high VIF indicates multicollinearity. The predicted values were saved and subsequently used in a receiver-operating characteristics (ROC) analysis to assess the diagnostic power of the entire prediction model. The ROC analysis was also used to determine the diagnostic power of texture features that were associated with mucinous-containing cysts, along with the calculation of the area under the curve (AUC), sensitivity, and specificity, with 95% confidence intervals (CIs). The ROC curves were calculated using the DeLong et al. method, and the binomial exact confidence intervals for the AUCs were reported. Optimal cut-off values were chosen using a common optimization step that maximized the Youden index for predicting patients with malignancies. Sensitivity (Se) and specificity (Sp) were computed from the same data, without further adjustments. The same texture workflow was conducted for both T2-and ADC-based features. Statistical analysis was performed using commercially available dedicated software, MedCalc v14.8.1 (MedCalc Software, Mariakerke, Belgium) and SPSS Statistics for Windows, version 18.0 (SPSS Inc., Chicago, IL, USA). The radiomics workflow diagram is displayed in Figure 3.
Based on the features extracted from ADC maps, the two reduction methods selected 16 unique parameters. Four parameters were highlighted by both Fisher and POE + ACC (CN3D6Contrast, CN2D6Contrast, RZD6GLevNonU, Perc90). Six parameters showed no statistically significant result upon univariate analysis (CN2D6Contrast, p = 0.53; CV2D6DifVarnc, p = 1.03; ATeta1, p = 0.073; CH3D6SumVarnc, p = 0.96; CN2S6Entropy, p = 0.64; CV5S6SumEntrp, p = 0.81). Four parameters (Perc01, ICC = 0.72; RNS6Fraction, ICC = 0.77; RNS6ShrtREmp, ICC = 0.41) were excluded from further processing due to the low ICC values. The ADC-based multivariate analysis showed a coefficient of determination of 0.78, an R 2 -adjusted of 0.61, and a multiple correlation coefficient of 0.84. Two parameters were excluded from the analysis due to high collinearity (as shown by a VIF > 104; CH1S6SumOfSqs and CV3S6Correlat). Of the remaining four parameters, none was demonstrated to be independently associated with the diagnosis of mucinous cysts. The T2-based and ADC map-based multivariate analysis results are displayed in Table 4. The diagnostic ability of the selected features to identify mucinous lesions is displayed in Table 5 and Figure 4. Texture maps based on feature distribution in T2WIs are shown in Figure 5.

Discussion
Our results show that T2WI, DWI, and ADC maps-based signal intensity measurements were unsuccessful in differentiating mucinous from non-mucinous pancreatic cystic lesions. The limited utility of the ADC coefficients to differentiate between the two PLC entities was also concluded in most of the previously published studies (Table 5). In our study, the SIMs on DWI images had the highest variation (COV) among the observers. Although having the lowest variation coefficient (6.9%; 95% CI, 5.46-8.37%), the ADC values also held the lowest inter-rater agreement coefficient (k < 0.001). We were only able to identify one previous study [33] that quantified the Si on different b-value sequences as a differentiation tool for PCLs. In the study conducted by Mottola et al. [33], the DWI measurements were presented as cyst-to-pancreas SI ratios, which showed that mucinous tumors had lower DWI cyst-to-pancreas SI ratio, on both b = 750 s/mm 2 (1.448 and 2.216, p = 0.013) and b = 1000 s/mm 2 (1.094 and 1.941, p = 0.015). A summary of the main findings of previous studies that investigated the role of ADC values in differentiating mucinous from non-mucinous PLCs is displayed in Table 6. Besides the variable and sometimes contradictory results, it is important to acknowledge that each of the previously published studies used different MRI protocols, therefore acquiring DWI images at different b-values. The b-value is a factor that mirrors the timing and strength of the gradients utilized to generate these sequences [39]. The DWI images are computed by turning diffusion-sensitizing gradients at various strengths [40], and the b-value is directly linked to the diffusion effects [39]. Our ADC measurements show that non-mucinous lesions had lower values than mucinous, which may seem counterintuitive at first glance. However, a deeper introspect into the MNPCs' mucin characteristics may partially explain the variable SIMs obtained in this group and, thus, the non-significant statistical analysis results. Firstly, most MNPC aspirates are hypocellular (a feature that is expected to increase the ADC values) [2]. Secondly, The MNPCs' pathological spectrum includes a wide variety of benign and malignant abnormalities, including non-neoplastic hyperplasia, adenoma, adenoma with severe atypia, and adenocarcinoma-each with its own microscopic features [41]. Moreover, the pancreatic epithelium produces different types of mucins in different stages of hyperplasia and malignant progression. Therefore, benign lesions tend to produce sulfated mucin, whereas malignant lesions tend to have neutral mucin or sialomucin [42]. The different types of mucin can also be expected to impact the SIMs. Boraschi et al. [37], who followed a similar design and also included multiple types of PLCs (IPMNs, pseudocysts, SCAs, and mucinous cystadenomas), observed analogous dynamics of the ADC coefficients. The authors [37] justified these results with the fact that SCAs have a multiseptate and multiloculate morphology (which includes fluid and solid components) and by their content, which includes glycogen-rich cells and proteinaceous fluid. In addition, the same study [37] demonstrated that the lowest ADC was associated with inflammatory lesions, which was attributed to the heterogeneity of the content found in pancreatitis-related collections, which can range from serous fluid to collections containing hemorrhagic or necrotic debris. What is important to consider is that in biologic tissue, the ADC values are influenced not only by the molecular diffusion of water, but also by the microcirculation of blood in the capillary network (perfusion characteristics). The perfusion effects typically influence the ADC coefficients when the map is computed from sequences with low b-values, which often increase registered ADC coefficients [35]. This perfusion effect may have influenced SCAs' measured DWI and ADC values, both in the current and previous studies [35,36], because SCAs are typically hyper-vascular (hyperperfused tumors) [43]. It also needs to be acknowledged that cystic lesions can often be contaminated by bleeding, infection, or debris, which could further influence the measured SIs [44,45].
Our radiomics processing of MRI images showed that, based on T2WI, five features were independent predictors for MNPCs. The "Contrast" parameter (CH3D6Contrast) is a measure of intensity or gray-level variations between the reference pixel and its neighbor. In the visual perception of the real world, contrast is determined by the difference in the color and brightness of the object and other objects within the same field of view [46,47]. The contrast parameter increases its values when there is a large amount of variation within an ROI [45]. This parameter demonstrated higher values for MNPCs than for nMNPCs. Inverse Difference Moment (IVD) measures the local homogeneity of an image. It was previously demonstrated that the IVD's weight value is the inverse of the Contrast weight [46,48]. This observation was validated by our findings, with the IVD parameter demonstrating higher values for non-mucinous lesions. All three of the selected Gray-Level Non-Uniformity (GLN) parameters displayed higher values for non-mucinous lesions. This metric increases when gray-level outliers dominate the histogram [49,50]. The GLN measures the variability of gray-level intensity values in the image, with a lower value indicating more homogeneity in intensity values. The interpretation of T2-based TA results may seem contradictory, with two independent parameters indicating a more heterogenous content for the MNPCs (CH3D6Contrast and CH5D6InvDfMom), while the three GLN parameters showed a more heterogenous content for the nMNPCs. However, both results can be true at the same time, considering the first two features were calculated through the Co-occurrence matrix and the GLN features were calculated through the Run-length matrix, each of them having their own computational method (i.e., looking at the same image through "different perspectives").
The ADC-based radiomics feature showed an overall lower sensitivity and higher specificity for the diagnosis of mucinous lesions (Table 5). However, none of them was able to independently predict the cysts' content nature. Again, the Contrast parameter demonstrated higher values for MNPCs, as in the T2-based analysis. Wavelet energy quantifies the distribution of energy along the frequency axis over scale and orientation. Energy measures the local uniformity within an image. When the gray levels of an image are distributed under a constant or periodical form, energy becomes high [51]. Our analysis showed higher values for MNPCs than for nMNPCs. Again, the GLN feature showed higher values for non-mucinous lesions. Short-and long-run emphasis reflects the distribution of short or long homogeneous runs in an image. High values of the long-run emphasis indicate coarse surfaces [52], and in our study, they were observed for non-mucinous lesions. Apparently, the parameters followed the same dynamics, although they were independently computed from T2WI and ADC paps. However, the same "contradictory" results were observed when interpreting the absolute values recorded by the ADC-derived textures. Therefore, no direct assumptions regarding which histological feature influences which parameter can be made. Further studies are required to directly link the mucinous nature of the fluid with the values of one or more specific texture parameters.
We were able to identify several previously published studies that investigated the role of radiomics in PCLs' diagnosis ( Table 7). The two main directions that these studies investigated were tumor grading [19][20][21][22][23][24] and ductal adenocarcinoma survival [25][26][27][28][29]. A study conducted by Chen et al. [30] had a similar premise, investigating the CT-derived texture features' ability to differentiate pancreatic serous cystadenomas from pancreatic mucinous cystadenomas. In this regard, the authors [30] built a combined model made of radiomic and conventional radiologic features, which was able to differentiate the two entities with 87.5-90% sensitivity and 82.4-84.6% specificity. Although this model outperformed the diagnostic ability of the classic radiologic features alone (75% sensitivity, 82.4% specificity), the authors [30] did not evaluate the radiomics features solely for diagnostic capabilities. Interestingly, all of the abovementioned radiomic studies only involved CT examinations [19][20][21][22][23][24][25][26][27][28][29][30]. For unknown reasons, the MRI-derived radiomic features' role in differentiation PCLs has never been investigated, but considering that this technique could provide more heterogeneous images than CT, it is possible that more texture information could be extracted. The technical aspects should not be neglected. The decision to use multiple ROIs rather than incorporating collections into a larger volume of interest (VOI) may have influenced our findings. ADC fluctuations between slices in abdominal MRI were observed by Miquel et al. [53], who conclusively proved that these variations are much less likely to influence three-dimensional (3D) VOIs because any differences between (and within) slices are likely to be averaged over the large VOI. In-and interslice averaging does not occur with 2D ROIs, but this method has the advantage of higher reproducibility coefficients compared with the 3D analysis [40,53]. We agree that using VOIs would have offered a more "comprehensive" description of diffusion within the collections, while also accounting for within-and between-slice variations. Our approach, although possibly considered less accurate, is closer to the current use of ADC measurements in clinical practice, and is more straightforward and less time-consuming than VOI segmentation. By only selecting examinations that were performed on the same machine and processed on the same workstation, we were able to successfully counteract inter-scanner variability in ADC measurements as well as the effect of different post-processing software on ADC values [54]. This becomes particularly important when considering that previous research found up to 4% variability when using different MRI machines and up to 8% variability when processing ADC with different types of software [55]. Almost all of the previously published CT-based radiomic studies involving pancreatic lesions used manually-delineated ROIs [19][20][21][22][23][24][27][28][29]. One study used a VOI [25] and another used an automatic method for ROI delineation [26]. The single-slice ROI TA analysis that was used in the current study can be regarded as controversial [56]. Moreover, some of the included PCLs were small, and the top and bottom slices were often with artifacts. It has also not been clearly demonstrated that there is significant added value in radiomic analysis by undertaking multi-slice/volumetric analysis, as previous TA studies [57] that compared the two ROI definition techniques concluded that there is no significant difference between the two in selected applications. The spatial resolution, strength of the magnetic field, signal-to-noise ratio, and other acquisition parameters can impact the TA results [58]. We were able to counteract these influences by only choosing examinations that were conducted on the same machine under the same acquisition protocol. From both medical and management perspectives, it is of priority to evaluate as much information as possible from the standard MRI sequences, rather than administrating contrast or acquiring supplementary sequences [45]. In this regard, TA could become an important tool that could increase the confidence in the MRI diagnostic of PCLs, if further validated by larger prospective studies.
Our study had several limitations. First, due to its retrospective design, it could have had selection bias. It remains debatable whether the inclusion of both pathologically confirmed and unconfirmed lesions could be regarded as a pitfall. However, not all PCLs required surgery, biopsy, or surgery per primam, and we only chose the lesions that underwent follow-up and had a clear imaging and clinical diagnosis. Similarly, the diagnosis was often assumed by imaging and/or follow-up without pathologic confirmation in most of the previously published studies with the same goal [13,33,35]. Moreover, being a retrospective study may have introduced verification bias regarding the patients' clinical follow-up, which mainly depends on the status of the institution and referral hospital. The study population was also rather small, which was due to the strict inclusion criteria (especially the size criterion, which was necessary to provide enough surface for ROI placement and the difficulty of visualizing these lesions on ADC maps) and also due to the status of our institution. In particular, the SCA population was relatively small, a limitation also encountered in most similar previous studies [33,35,37]. The fact that one researcher (CC) was aware of the final diagnosis could also be considered a limitation. However, because at the time of the MRI examinations, some of the patients may have presented with multiple pancreatic lesions, this approach was necessary in order to only select documented lesions. After this step, that particular researcher was not involved in the processes of image segmentation, statistical analysis, or reporting the results. Splitting the cohort into training and validation (testing) groups is an important stage in the radiomics classification process. Approximately 70% of the acquired dataset is typically utilized for training, with the remaining samples being used to assess the features' classification performance [59], which we were unable to perform due to the small cohort. Therefore, further studies are required to confirm our findings. In addition, our choice to use MaZda [60] as the imaging processing software may be viewed as outdated. Although various texture applications have been developed, only a few can provide built-in approaches for feature reduction and vector classification within an intuitive interface that may be utilized by non-image processing experts, such as medical physicians.

Conclusions
The MRI-based SIMs were unable to provide statistically significant results when comparing mucinous to non-mucinous PCLs. Radiomics features have the potential to augment the PLCs' differential diagnosis, but future studies are required to investigate the extract histological substrate that influences PCLs' textures.