CT-Based Radiomics Analysis to Predict Malignancy in Patients with Intraductal Papillary Mucinous Neoplasm (IPMN) of the Pancreas

Simple Summary The management of intraductal papillary mucinous neoplasms of the pancreas (IPMN) remains controversial due to the relatively high rate of unnecessary surgery for low grade dysplasia (LGD) despite the last international recommendations. The aim of our retrospective study was to assess the performance of radiomic analysis on CT in differentiating benign from malignant IPMN. We confirmed in a training cohort (296 patients) and a validation cohort (112 patients) that a total of 85 radiomics features provided valuable additional and independent information for discriminating benign from malignant tumors in the training cohort with an area under the ROC curve (AUC) of 0.84 and an external validation with an AUC of 0.71 with higher performance when implementing clinical variables leading to the indication to surgery. We have demonstrated the capabilities of radiomics models comprising LGD versus high-grade dysplasia (HGD) versus invasive, LGD and HGD, HGD and invasive. Abstract To assess the performance of CT-based radiomics analysis in differentiating benign from malignant intraductal papillary mucinous neoplasms of the pancreas (IPMN), preoperative scans of 408 resected patients with IPMN were retrospectively analyzed. IPMNs were classified as benign (low-grade dysplasia, n = 181), or malignant (high grade, n = 128, and invasive, n = 99). Clinicobiological data were reported. Patients were divided into a training cohort (TC) of 296 patients and an external validation cohort (EVC) of 112 patients. After semi-automatic tumor segmentation, PyRadiomics was used to extract radiomics features. A multivariate model was developed using a logistic regression approach. In the training cohort, 85/107 radiomics features were significantly different between patients with benign and malignant IPMNs. Unsupervised clustering analysis revealed four distinct clusters of patients with similar radiomics features patterns with malignancy as the most significant association. The multivariate model differentiated benign from malignant tumors in TC with an area under the ROC curve (AUC) of 0.84, sensitivity (Se) of 0.82, specificity (Spe) of 0.74, and in EVC with an AUC of 0.71, Se of 0.69, Spe of 0.57. This large study confirms the high diagnostic performance of preoperative CT-based radiomics analysis to differentiate between benign from malignant IPMNs.


Introduction
Intraductal papillary mucinous neoplasms (IPMN) of the pancreas are mucin-producing tumors that originate from the pancreatic ducts. Their incidence keeps growing due to improvement and widespread use of cross-sectional imaging coupled with the population's increasing age. They account for nearly half of the pancreatic cysts discovered in up to 2.6% of computed tomography (CT) scans and 20% of magnetic resonance imaging (MRI) studies each year [1,2]. Most of the time, they are discovered incidentally for conditions unrelated to the pancreas.
IPMNs comprise a clinically challenging entity since they exhibit a potential risk of invasive transformation and are a precursor of concomitant pancreatic ductal adenocarcinoma (PDAC) in 4-11% of cases [3][4][5][6]. IPMNs can be classified into three types according to the ductal involvement: MD (main duct)-IPMN, BD (branch-duct)-IPMN, and mixed type as well as into four pathologic subtypes: gastric, intestinal, pancreatobiliary, and oncocytic, with each displaying distinct biological behavior [7]. Moreover, the spectrum of architectural and histological dysplasia is a determinant concept for the risk of malignant transformation. Based on the WHO 2019 classification, there are low-grade dysplasia (LGD), high-grade dysplasia (HGD), and invasive intraductal papillary mucinous carcinomas (IPMC) [8]. An aggressive surgical resection approach was initially recommended for the prevention or early treatment of pancreatic cancer. However, this has led to overtreatment, as highlighted in a systematic review of 37 case series that showed that the rate of malignancy development during follow-up is low, with only 2.8% of patients developing invasive neoplasia [9]. In addition, pancreatectomy is responsible for one of the highest rates of morbidity (40-50%) and mortality (2-4%) in abdominal surgery with postoperative complications that include mainly endocrine and exocrine pancreatic insufficiencies as well as the development of steatosis [10][11][12]. Therefore, the current management of IPMN requires the maintenance of a balance between the risk of potential malignant transformation and the risk of pancreatic resection. International guidelines have illustrated this principle with the emergence of a more conservative approach over the last few years in patients at low risk of developing invasive carcinoma or HGD [13][14][15].
IPMNs with obvious "high-risk stigmata" (HRS) on CT/MRI (jaundice, the presence of an enhancing mural nodule (≥5 mm) or a solid component, or a MPD measuring ≥10 mm) are highly predictive of malignancy and should undergo resection in surgically fit patients [13,15].
"Worrisome features" (WF) include cysts of a diameter ≥30 mm, enhancing mural nodule <5 mm, thickened enhanced cyst walls, MPD size of 5-9 mm, an abrupt change in the MPD caliber with distal pancreatic atrophy, lymphadenopathy, an elevated serum level of carbohydrate antigen (CA) , and a rate of cyst growth > 5 mm/2 years. These are also associated with an increased risk of high-grade

Surgical Indications
Since January 2005, all clinical, radiological, and biological data of patients undergoing a pancreatectomy in our institution were collected in our archive. We extracted all the features responsible for surgical indications according to the criteria retained in the ICG and the recent European recommendations [13,15] as well as other causes that led to surgical resection.

CECT Protocols
In our institution, contrast-enhanced abdominal CT exams were performed on 64 section multidetector CT scanners (222 patients with LightSpeed VCT, (GE Healthcare, WI, USA) and 62 patients on Discovery CT750 HD LightSpeed (GE Healthcare, USA) using the parameters detailed in Table 1. Contrast-enhanced acquisitions were obtained following intravenous administration into the antecubital vein of 2 mL/kg of non-ionic contrast medium at 350 mg non-ionic/mL with an automatic power injector rate 3 mL/s through an 18-or 20-gauge intravenous catheter.
CECT performed in other institutions were acquired in more than fifty different centers using various CT units. The acquisition parameters matched our inclusion criteria.

Surgical Indications
Since January 2005, all clinical, radiological, and biological data of patients undergoing a pancreatectomy in our institution were collected in our archive. We extracted all the features responsible for surgical indications according to the criteria retained in the ICG and the recent European recommendations [13,15] as well as other causes that led to surgical resection.

CECT Protocols
In our institution, contrast-enhanced abdominal CT exams were performed on 64 section multidetector CT scanners (222 patients with LightSpeed VCT, (GE Healthcare, Waukesha, WI, USA) and 62 patients on Discovery CT750 HD LightSpeed (GE Healthcare, Waukesha, WI, USA) using the parameters detailed in Table 1. Contrast-enhanced acquisitions were obtained following intravenous administration into the antecubital vein of 2 mL/kg of non-ionic contrast medium at 350 mg non-ionic/mL with an automatic power injector rate 3 mL/s through an 18-or 20-gauge intravenous catheter.
CECT performed in other institutions were acquired in more than fifty different centers using various CT units. The acquisition parameters matched our inclusion criteria.

Histopathological Data
All cases were handled by expert pancreatic pathologists (A.C, J.C) with, for most cases, a frozen examination of the pancreatic margin to assess whether an extension of the surgery was required. Specimens were then submitted to a standardized macroscopic protocol. Before formol fixation, the main pancreatic duct was inked, and the specimen sliced perpendicularly to the duodenum or the main duct for Whipple and left resection, respectively. This allowed us to precisely map the localization of the cysts and assess the macroscopic main duct involvement. After fixation, the slices were all photographed, and all the cysts were fully sampled for microscopic examination. IPMN lesions were reviewed for this study to classify them according to the digestive WHO 2019 classification (low grade vs. high-grade dysplasia) [33], microscopic IPMN phenotypes, and lymph node metastasis status were collected.

Segmentation
Image segmentation was performed by one reader with five years of abdominal imaging experience. Before segmenting the cohort's tumors, consensus segmentation was performed on 15 exams, including challenging cases with a senior radiologist who had more than 20 years of experience in IPMN diagnosis. The reader was blinded to the pathologic analysis but not blinded to the type of surgery. Using a web-based medical segmentation tool (MedSeg [34]), delineation of the volume of interest (VOI) was drawn in a semi-automatic manner, covering the tumor's volume as represented in Figure 2. The semi-automatic segmentation consisted of a smart brush with a variable radius and adjustable minimum and maximum intensity thresholds, preventing voxels with an intensity outside these user-defined thresholds of being segmented, reducing, this way, the number of necessary corrections when compared with a non-thresholded brush segmentation. If multiple cysts were present with no clear evidence of worrisome features, the most conspicuous or larger cyst was selected in the pancreas portion according to the type of surgery (duodenopancreatectomy, left pancreatectomy, enucleation, others). The reader was given no specific directions regarding the display settings such as thresholding and window level. The reader visually inspected the semi-automated segmentation and made free hand corrections when appropriate to reproduce the tumor's closest shape and avoid including adjacent fat and vessels. The pancreatic phase was favored for tumor segmentation. When it was not available, a portal venous phase was used for image segmentation and analysis [35][36][37][38].

Radiomics Feature Extraction Methodology
PyRadiomics [39] version 2.2.0 [40] was used to extract 107 radiomics features from the CTexaminations and corresponding tumor segmentation masks. Shape, first-order, Gray Level Cooccurrence Matrix (GLCM), Grey-Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Neighboring Gray Tone Difference Matrix (NGTDM), and Gray Level Dependence Matrix (GLDM) features were extracted. Due to the quantitative nature of CT images [41,42], a fixed bin size was optimized to ensure that the range of the intensity of lesions would be from 30 to 130 bins across patients. An optimal bin size of 3 Hounsfield units was selected to discretize image intensities, part of the extraction of non-shape features. Due to the heterogeneity of the through-plane resolution within the overall dataset, a 2D feature extraction scheme was used where only the in-plane resolution was standardized to 1.00 mm × 1.00 mm across images using a B-Spline interpolation. Resegmentation using an intensity outlier filtering was applied. In this intensity outlier filtering, the mean and standard deviation of each volume of interest was determined, and voxels were discarded if they lay outside of the range mean ± 3 standard deviations. The pancreatic phase was favored for tumor segmentation. When it was not available, a portal venous phase was used for image segmentation and analysis [35][36][37][38].  [41,42], a fixed bin size was optimized to ensure that the range of the intensity of lesions would be from 30 to 130 bins across patients. An optimal bin size of 3 Hounsfield units was selected to discretize image intensities, part of the extraction of non-shape features. Due to the heterogeneity of the through-plane resolution within the overall dataset, a 2D feature extraction scheme was used where only the in-plane resolution was standardized to 1.00 mm × 1.00 mm across images using a B-Spline interpolation. Resegmentation using an intensity outlier filtering was applied. In this intensity outlier filtering, the mean and standard deviation of each volume of interest was determined, and voxels were discarded if they lay outside of the range mean ± 3 standard deviations. Univariate analysis was conducted to assess each radiomic feature's power to significantly discriminate between benign and malignant IPMN and between different degrees of dysplasia in the training cohort. The Shapiro-Wilk test was used to assess the normality of the distribution of radiomics features in each group. The t-test was applied to assess whether both groups were statistically different based on their radiomic feature values when the distribution of each group was normal. In cases where the feature value distribution of one or both groups was not normally distributed, the Mann-Whitney U test was used to assess group differences. The statistical significance level, α, was set at 0.05, and the Holm method (uniformly more powerful than the Bonferroni method) was used to reduce Type I errors because of the multiple comparisons. The classification performance of features was assessed using the area under the Receiver Operating Characteristic (ROC) curve, and the Youden index was used to obtain the optimal cut-off of separation between groups, and corresponding accuracy, sensitivity (Se), and specificity (Spe) on the training cohort. The optimal cut-off for each feature determined using the training cohort was then applied to the external validation test cohort and accuracy, Se, and Spe was determined.

Unsupervised Clustering
Unsupervised clustering was used to investigate possible associations between groups of radiomics features with similar characteristics and clinical, biological, or radiological variables leading to surgical resection. The Euclidean distance was used as the clustering distance metric across features and patients, and the agglomerative hierarchical clustering method chosen was the average linkage. The chi-square test was used to test associations between patient clusters and clinical parameters.

Multivariate Modeling
The model development and tuning consisted of several steps. Initially, the radiomics features with near-zero and zero variance on the training dataset were identified and removed. Consequently, highly correlated features were removed using a correlation coefficient threshold of 0.95. The remaining features were used to develop and tune the models.
Two pairs of models to classify IPMNs were designed. The first used radiomics features only (referred to as the radiomics model). In the second model, radiomics features were associated with the surgical indication variables known to guide the patient to surgery (referred to as radiomics + surgical indication variables). These analyses were developed and assessed for several scenarios: benign versus malignant IPMNs and between different degrees of dysplasia. For the development of these models, Logistic Regressions with Least Absolute Shrinkage and Selection Operator (LASSO) regularizations, which is used to perform high-dimensional data regression analysis, penalizing model complexity and selecting the most informative features, were tuned using the training dataset. A ten-fold cross-validation tuning procedure with the area under the Receiver Operating Characteristic (ROC) Curve (AUC) as the optimization metric was used to determine the optimal regularization parameter lambda. Additionally, the accuracy, Se, Spe, Positive Predictive Value (PPV), Negative Predictive Value (NPV), and Matthews Correlation Coefficients (MCC) were also calculated. The overview of the model development is illustrated in Figure 3.  The different scenarios were also assessed only on the subgroup of branch duct IPMN patients. Due to the higher-class imbalances, the MCC was used as the optimization metric, and the models were developed using a Naïve Bayes classifier.
The trained models were then applied to the external validation cohort to predict the corresponding outcomes, and AUC, accuracy, Se, Spe, PPV, NPV, and MCC were calculated. The 95% Confidence Interval (CI) values for each performance metric, except the AUC, were obtained assuming the binomial distribution's approximation to a Gaussian, which allows the proportion of Gaussian distribution to estimate the confidence intervals. As for the AUC, the CI was obtained using the DeLong method.

Study Population
From January 2005 to December 2018, 408 patients with pathologically diagnosed IPMN were eligible in this retrospective study. The training cohort consisted of 152 males and 144 females (range 30-82 years and mean 63.5 years), and the validation cohort included 58 males and 54 females (range 36-84 years and mean 63.2 years). There were no statistically significant differences in category, age, gender, clinical features (jaundice, acute pancreatitis), laboratory tests, type of surgery, anatomical classification, the grade of dysplasia, phenotype classification, and lymphadenopathy on specimens between the training and validation cohorts (p > 0.05). The differences in WF/HRS (p = 0.008), available CECT phases (p = 2.0 × 10 −6 ), and days between CECT and surgical resection (p = 6.0 × 10 −8 ) were statistically significant between the training cohort and the validation cohort. More details are shown in patient characteristics reported in Table 2. The different scenarios were also assessed only on the subgroup of branch duct IPMN patients. Due to the higher-class imbalances, the MCC was used as the optimization metric, and the models were developed using a Naïve Bayes classifier.
The trained models were then applied to the external validation cohort to predict the corresponding outcomes, and AUC, accuracy, Se, Spe, PPV, NPV, and MCC were calculated. The 95% Confidence Interval (CI) values for each performance metric, except the AUC, were obtained assuming the binomial distribution's approximation to a Gaussian, which allows the proportion of Gaussian distribution to estimate the confidence intervals. As for the AUC, the CI was obtained using the DeLong method.

Study Population
From January 2005 to December 2018, 408 patients with pathologically diagnosed IPMN were eligible in this retrospective study. The training cohort consisted of 152 males and 144 females (range 30-82 years and mean 63.5 years), and the validation cohort included 58 males and 54 females (range 36-84 years and mean 63.2 years). There were no statistically significant differences in category, age, gender, clinical features (jaundice, acute pancreatitis), laboratory tests, type of surgery, anatomical classification, the grade of dysplasia, phenotype classification, and lymphadenopathy on specimens between the training and validation cohorts (p > 0.05). The differences in WF/HRS (p = 0.008), available CECT phases (p = 2.0 × 10 −6 ), and days between CECT and surgical resection (p = 6.0 × 10 −8 ) were statistically significant between the training cohort and the validation cohort. More details are shown in patient characteristics reported in Table 2. A sub-group analysis of patients with BD-IPMN was individualized. Further details are available in Table S1.

Univariate Analysis
The univariate analysis performed on the training cohort showed a total of 85 (79%) radiomics features with significant differences between patients with benign and malignant IPMNs. A subset of significant features with an AUC > 0.725 is shown in Table 3 (the full list is provided in the Supplementary Material Table S2). The optimal cut-off was used to assess the accuracy, sensitivity, and specificity on the external validation cohort. Table 3. Subset of radiomics features in the training cohort and the external validation cohort presenting statistically significant differences between benign and malignant groups. The adjusted p-value, AUC, optimal cut-off, accuracy, sensitivity, and specificity are provided for the training cohort.

Unsupervised Clustering
Unsupervised clustering enabled the identification of four distinct clusters of patients with similar radiomic feature patterns, and the comparison between these clusters and corresponding malignancy status, degrees of dysplasia, and clinical variables providing an indication to surgery was performed and presented in Figure 4. Statistically significant associations with the obtained clusters were found for malignancy status (p = 0.002), jaundice (p = 0.039), elevation of CA 19-9 (p = 0.038), WF/HRS (p = 0.024), acute pancreatitis (p = 0.015), and familial history of pancreatic cancer (p = 0.040). The degree of dysplasia (p = 1.000) did not show significant associations with patients' obtained clusters.  Table 4 and represented in the Figure 5A,B. similar radiomic feature patterns, and the comparison between these clusters and corresponding malignancy status, degrees of dysplasia, and clinical variables providing an indication to surgery was performed and presented in Figure 4. Statistically significant associations with the obtained clusters were found for malignancy status (p = 0.002), jaundice (p = 0.039), elevation of CA 19-9 (p = 0.038), WF/HRS (p = 0.024), acute pancreatitis (p = 0.015), and familial history of pancreatic cancer (p = 0.040). The degree of dysplasia (p = 1.000) did not show significant associations with patients' obtained clusters.       Table S5.

Multivariate Analysis
All the ROC curves for the training and external validation cohorts for the differentiation between benign and malignant, low-grade and high-grade, and high-grade and invasive IPMN in the whole population are provided in Figure 5.
The models' coefficients for the different scenarios are shown in Figure S1.

Subgroup of BD-IPMNs
Additionally, models for the subgroup of patients with branch duct (BD) IPMNs were developed, and corresponding performances were calculated. The performance of the radiomics and radiomics with surgical indication variable models developed for the discrimination between benign and  Table 5. Table 5. Cross-validation and external validation performance of models using only radiomics features and using radiomics features + clinical variables to differentiate between benign and malignant pancreatic BD-IPMN patient.

Radiomics Only
Radiomics + Surgical Indication Variables Similarly to the whole population analysis, we developed models in the BD-IPMNs, using only radiomics features and radiomics features + surgical indication variables to discriminate between LGD and HGD, between HGD and invasive, as well as LGD vs. HGD vs. invasive (Tables S6-S8).

Discussion
In this study, we have shown in a large cohort of IPMN patients that radiomics enable distinction of the different IPMN grades, particularly benign (low-grade dysplasia) from malignant (high-grade dysplasia and invasive carcinoma) ones. As it is a very challenging issue with current clinicobiological and radiological data, we think it could contribute to better patient management.
Eighty-five radiomics features showed the statistical power to discriminate between benign and malignant IPMNs and the top ten features were first-or second-order radiomics features with AUCs above 0.725 distinguishing benign from malignant. A common strategy to explore if useful patterns or otherwise called "machine learning signal", are present in our data, is to perform unsupervised clustering analysis In our study, it revealed four distinct clusters of patients with similar radiomic features patterns. Among the significant associations, malignancy was the most significant, providing evidence that the computed radiomic phenotypes are significantly different between benign and malignant IPMNs. The next step is multivariate modeling where there is a need to identify the subset of features otherwise called radiomics signatures, that can be used to predict the final outcome with high performance and robustness.
In our study, the multivariate model used only radiomic features had high AUC in the training and in the external validation cohorts (0.84 and 0.71, respectively). This is the first large radiomics study using a highly diverse multiple external validation cohort. Preliminary radiomic studies in IPMN [26,27,32] (including 38, 51, and 53 patients) have shown similar diagnostic performance for differentiating benign from malignant IPMN with AUC values ranging between 0.77 and 0.87.
These diagnostic performances challenge the classical criteria used to differentiate benign from malignant IPMN including a combination of clinicobiological and radiological criteria, the most common being worrisome features and high-risk stigmata or imaging findings alone. In a meta-analysis comprising 21 studies with 3723 patients with cystic lesions of the pancreas (mostly IPMN), the 2012 Fukuoka and the AGA guidelines had an AUC of 0.78. and 0.79, for the prediction of malignancy, respectively [51]. Comparison of the 2012 and 2017 Fukuoka guidelines even showed a decrease in specificity translating into futile pancreatic resection and stressing the fact that there is a need to improve IPMN classification [16]. Using CT or MRI features only, the presence of an enhancing mural nodule, main pancreatic duct size, abrupt main pancreatic duct caliber change, and lymphadenopathy were more common in malignant than benign IPMNs resulting in AUCs of 0.83 and 0.86 with CT and MRI, respectively [52]. Besides recommendations, nomograms have been tested and validated in Eastern and Western patients, with AUC for combined cohorts in the range of previous works (0.776) [53].
Because BD-IPMN are less prone to malignancy, and only 18% of resected BD-IPMN were malignant at pathology, distinction between benign and malignant is even more important in BD-IPMN than in other types [54], justifying the conduction of radiomics analyses in this subgroup. AUCs were slightly below the results obtained in the whole population (0.73 and 0.55 in the training and in the external validation cohort, respectively). Yet, they are consistent with those reported by preliminary radiomics studies [30,31]. Similarly to IPMNs as a whole, diagnostic performance of imaging criteria has been shown to be low in a meta-analysis focusing on BD-IPMN [55]. The best imaging features was the presence of a mural nodule with an AUC 0.786, a pooled sensitivity of 59% and a pooled specificity of 83% [55].
Whatever the approach (clinical, radiological, or quantitative), sensitivities and specificities do not reach 90% and therefore do not adequately stratify and identify patients at risk of having high-risk dysplasia or invasive cancer.
In our study, we have also evaluated the ability of combined qualitative (surgical indications) and quantitative data to discriminate benign from malignant IPMN. We obtained diagnostic performances similar to the radiomics-only approach (AUC of 0.83 and 0.75) in the training and in the external validation cohort. Lack of major improvement was also observed in preliminary series and could be explained by the fact that surgical indication variables have highly selected the patient population that should be resected [30]. Only one study reported a significant increase in AUC by combining CT radiomics features with matched plasma-based miRNA genomic classifier in 38 patients [27].
We also developed similar models for the discrimination of patients' subgroups: low-grade vs. high-grade dysplasia, high-grade dysplasia vs. invasive carcinoma, and low-grade vs. high-grade dysplasia vs. invasive carcinoma. AUCs ranged from 0.80-0.92 in the training cohort and 0.73-0.91 in the external validation cohort. Interestingly, the highest diagnostic performance was observed in distinguishing high-grade dysplasia and invasive pancreatic IPMNs. This may have an important clinical impact as increasing data have shown the benefits of neoadjuvant chemotherapy in pancreatic adenocarcinoma, explaining the shift from upfront surgery to multidisciplinary treatment of even resectable pancreatic cancer [56].
Due to the relatively high number of radiomic features, there is an increased likelihood of overfitting. In the case of the IPMN patients, Logistic Regression models were regularized using LASSO to ensure that not all features were used in the models, as this method penalizes the model complexity and selects the most informative features. As for the Naïve Bayes models used for BD-IPMNs, no regularization was utilized, which could explain the difference in performance between the training and external testing cohorts.
Conversely to the preliminary radiomics studies in IPMN, ours comprised a large training cohort and an external validation cohort. Interestingly, our training cohort was composed of patients examined at our institution with a highly standardized acquisition protocol and few CT units, while the external validation cohort were patients who had their CT examinations performed outside our institution with various CT machines and protocols (including timing of the contrast-enhanced phases). Overall, the general performance of the algorithms was not altered in the external validation cohort, demonstrating the ability of the study to allow for generalization of the study results to the target population [57].
MRI is the modality of choice in IPMN as it is far more sensitive than CT in detecting cystic lesions of the pancreas, probably due to the higher contrast resolution of MRI, and for identifying communication between a pancreatic cystic neoplasm and the pancreatic duct system [13]. However, we have chosen to evaluate CT radiomics in patients with IPMN. Noticeably, pancreatic CT and pancreatic MRI/MRCP have a similar accuracy for the characterization of pancreatic cystic neoplasms [58,59]. In IPMNs, diagnostic performances of contrast-enhanced CT and MRI for prediction of malignancy are similar without demonstrated significant differences [60,61]. Moreover, MRI would have been much more difficult because it suffers from less standardization than CT with a large variety in the multiple parameters related to scanner properties, acquisition settings, and image processing that could hamper radiomics analysis. However, further studies should also consider MR-based radiomics.
Another interesting topic would be to analyze CT-or MR-based radiomics in patients who had serial imaging examinations in order to evaluate the potential of radiomics in assessing changes that would not be captured by morphological criteria.
Our study has limitations. First, this is a retrospective unicentric study, which could reduce generalization of the results. Yet, our patient population exceeds 400 patients and we have set up an external validation cohort reflecting the daily practice. As the method of reference is the pathologic examination of the resected pancreas, we also wanted to benefit from our expert pathologists. Second, the segmentation was semi-automatic and performed by one radiologist. Segmentation is indeed a critical step of the radiomics process because data are extracted from the segmented volumes. Fully automatic segmentation methods have not been used in this study because they were very challenging in poorly defined IPMNs. Where manual segmentation can suffer from inter-observer variability, it has been shown that semi-automated approaches are fast and reduce inter-observer variability [62]. Moreover, the first 15 IPMN cases have been segmented by the radiologist and reviewed by an expert radiologist in pancreatic diseases.

Conclusions
In conclusion, our large study shows high diagnostic performance of CT radiomics in differentiating benign from malignant IPMNs as well as in subgroups classification. Results obtained from the training and external validation cohorts were similar, enabling generalization of the proposed models. There is still room for improvement and further large studies should perform head-to head comparisons of the performance of radiomics-based and imaging-based approaches, as well as in combination.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-6694/12/11/3089/ s1, Table S1: BD-IPMN patients characteristics for the training and the external validation cohorts, Table  S2: Complementary subset of radiomic features in the training cohort and the external validation cohort presenting statistically significant differences between benign and malignant groups, Table S3: Cross-validation and external validation performance of models using only radiomic features and using radiomic features + clinical variables to differentiate between low-grade and high-grade pancreatic IPMN patients, Table S4: Cross-validation and external validation performance of models using only radiomic features and using radiomic features + clinical variables for the differentiation between patients with low-grade, high-grade, and invasive pancreatic IPMNs, Table S5: Cross-validation and external validation performance of models using only radiomic features and using radiomic features + clinical variables to differentiate between patients with high-grade and invasive pancreatic IPMNs, Table S6: Cross-validation and external validation performance of models using only radiomic features and using radiomic features + clinical variables to differentiate between low-grade and high-grade pancreatic BD-IPMN patients, Table S7: Cross-validation and external validation performance of models using only radiomic features and using radiomic features + clinical variables to differentiate between patients with low-grade, high-grade, and invasive pancreatic IPMNs, Table S8: Cross-validation and external validation performance of models using only radiomic features and using radiomic features + clinical variables to differentiate between patients with high-grade and invasive pancreatic BD-IPMNs, Figure S1: Coefficients of selected variables for (A) radiomics and (B) radiomics + surgical indications models for the differentiation between benign and malignant IPMNs. (C) and (D) provide the coefficients of selected variables for radiomics and radiomics + surgical indications models for the differentiation of low-grade and high-grade dysplasia IPMNs, while (E) and (F) show the coefficients of radiomics and radiomics + surgical indications models for differentiation between high-grade and invasive IPMNs.