Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

The diagnosis of MCN and SCN is challenging because of the lack of criteria for differentiation [1,2,6,10,12,14,15]. Nagashio et al. [6] utilized fluid markers such as the carcinoembryonic antigens (CEA), carbohydrate antigens (CA 19-9 and CA 125), amylase, as well as cytology to classify different pancreatic lesions and to differentiate between malignant and benign lesions. The study showed that the combination of fluid markers is useful for differentiation between mucinous and non-mucinous lesions with an accuracy of 77.8%. Gaddam et al. [7] also tried to find the optimal cutoff value for the CEA level to diagnose between mucinous and non-mucinous cystic lesions, and the result from the study could reach an AUC around 0.77. Pérez et al. [5] investigated the impact of molecular analysis for PCN classification by analyzing the combination of morphological, cytological, and biochemical analysis to differentiate between mucinous and non-mucinous PCNs. Their study showed that molecular analysis was effective for the differentiation of mucinous cysts from non-mucinous cysts. Wang Y. et al. [9] classified MCN and SCN by investigating the alterations of glycopatterns in cystic fluids and demonstrated that the variation in glycopatterns and glycoproteins are associated with the differences between MCN and SCN. Biomarkers can be good indicators in the diagnosis but have limitations if used alone.
Various imaging modalities such as endoscopic ultrasound (EUS), magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) are used for visual diagnosis of MCN and SCN [1,[3][4][5]7,10,12]. Wang G. et al. [13] proposed a method to differentiate PCNs by combining the observations from PET/CT with multiparametric analysis such as the maximum and mean of standardized uptake values, metabolic tumor volume (MTV), and total lesion glycolysis (TLG). Their study achieved an AUC up to 0.810. Diagnosis using imaging modalities mainly relies on visual observations and biopsies. Among the imaging modalities, EUS has the advantage of capturing internal structures [1,4,12,16] and can be combined with a biopsy by using fine-needle aspiration (FNA) [3,10]. Köker et al. [17] performed the EUS-FNA to investigate the effectiveness of CEA level for differentiating low risk IPMN from low risk MCN. The study achieved an AUC around 0.930 for differentiating between low risk IPMN and low risk MCN. Zhang et al. [1] suggested certain criteria based on the visual features of pancreatic cysts in EUS images such as honeycomb cysts, latticed cysts, and round shaped cysts featuring nodules and calcification. In addition to the visual information, the carcinoembryonic antigen (CEA) level analyzed from cyst fluid was utilized. These criteria helped achieve 82.93% accuracy. Kubo et al. [2] also reported features to diagnose PCN in the EUS images, which were microcystic areas, mural nodules, the number of cystic formations, locularity, and the cystic component. Despite providing pathological information for the diagnosis of cysts, EUS with FNA has some safety concerns as it can lead to severe outcomes for non-mucinous cases, such as peritoneal dissemination or gastric seeding due to the leakage of cyst fluid [18,19].
Using deep learning techniques such as convolutional neural networks (CNNs), complicated features that are unrecognizable to human vision can be extracted. Thus, deep learning technology is widely used in classification tasks in the medical imaging field [14][15][16][20][21][22]. Kurita et al. [15] investigated the feasibility of neural network models for differentiating between malignant and benign pancreatic lesions by utilizing the carcinoembryonic antigen (CEA), cytology, and other clinical information, and showed an accuracy of 92.9%. Kuwahara et al. [20] used the CNN model to diagnose benign and malignant cases of IPMN by utilizing EUS images of PCN lesions, and achieved 86.2% accuracy.
The recent studies have proven that the deep learning techniques can be applied to diagnosis of pancreatic disease. Thus far, to the best of our knowledge, no study has been conducted on the classification of MCN and SCN from EUS images using CNN models. In this study, we aim to implement CNN models to differentiate between MCN and SCN from EUS images of the pancreas. The performance of the CNN model for this classification is thoroughly evaluated. scanning system (SSD 5500, 5 and 10; Aloka, Tokyo, Japan) with linear (GF-UCT 240; Olympus Optical corp. Ltd., Tokyo, Japan) and radial (GF-UE 260; Olympus Optical corp. Ltd., Tokyo, Japan) echoendoscopes was used. The examination was conducted by endosonographers with at least five years of experience. Endosonographers at two separate sites independently acquired and completed the analysis of the EUS images. The clinical data were analyzed for image findings and pathological diagnosis of the patients. To validate the diagnosis of unclear cases, FNA or histology analysis through surgery was performed and 44% of the patients experienced this additional level of diagnosis. This study was reviewed and approved by the Institutional Review Boards of Inha University Hospital (2020-05-002) and of Asan Medical Center (2020-1290-0001).

Materials and Methods
The EUS data were acquired from 47 MCN and 31 SCN patients at the 1st hospital and 13 MCN and 18 SCN patients at the 2nd hospital. The characteristics of the patients and lesions for this study are summarized in Table 1. Figure 1 shows a sample of histology images for MCN ( Figure 1A) and SCN ( Figure 1B) cases. All procedures were performed between December 2010 and December 2020 at the endoscopic center of Inha University Hospital and Asan Medical Center. An ultrasound scanning system (SSD 5500, 5 and 10; Aloka, Tokyo, Japan) with linear (GF-UCT 240; Olympus Optical corp. Ltd., Tokyo, Japan) and radial (GF-UE 260; Olympus Optical corp. Ltd., Tokyo, Japan) echoendoscopes was used. The examination was conducted by endosonographers with at least five years of experience. Endosonographers at two separate sites independently acquired and completed the analysis of the EUS images. The clinical data were analyzed for image findings and pathological diagnosis of the patients. To validate the diagnosis of unclear cases, FNA or histology analysis through surgery was performed and 44% of the patients experienced this additional level of diagnosis. This study was reviewed and approved by the Institutional Review Boards of Inha University Hospital (2020-05-002) and of Asan Medical Center (2020-1290-0001).
The EUS data were acquired from 47 MCN and 31 SCN patients at the 1 st hospital and 13 MCN and 18 SCN patients at the 2nd hospital. The characteristics of the patients and lesions for this study are summarized in Table 1. Figure 1 shows a sample of histology images for MCN ( Figure 1A) and SCN ( Figure 1B) cases.   Figure 2 illustrates the data preparation and preprocessing procedure for this study. From each patient's data (out of 109 patients), 1-4 EUS images were extracted. For MCN and SCN cases, 130 and 81 images, respectively, were selected. The region surrounding the cysts in the EUS image was selected as the region of interest (ROI). Unnecessary regions including the metadata information and the circular hollow region (radial EUS probe) or semicircular hollow region (linear EUS probe) were excluded from the ROI. For the test data of the CNN model, two types of ROI selections were made to check whether the performance of the network model was affected by the ROI selection. The first type of  Figure 2 illustrates the data preparation and preprocessing procedure for this study. From each patient's data (out of 109 patients), 1-4 EUS images were extracted. For MCN and SCN cases, 130 and 81 images, respectively, were selected. The region surrounding the cysts in the EUS image was selected as the region of interest (ROI). Unnecessary regions including the metadata information and the circular hollow region (radial EUS probe) or semicircular hollow region (linear EUS probe) were excluded from the ROI. For the test data of the CNN model, two types of ROI selections were made to check whether the performance of the network model was affected by the ROI selection. The first type of ROI was selected in an aspect ratio of 1:1 (i.e., a square); this type of ROI selection is denoted as single-ROI. Another type of ROI selection involved varying the size and aspect ratio of four different ROIs; this type of ROI selection is denoted by multi-ROI. In order to validate the proposed model without bias, we performed the hold out validation three times. From each group, 10 different patients were selected for each class (i.e., MCN and SCN) while the rest of the data were used as training data, as shown in Figure 3.

Data Preparation and Preprocessing
ROI was selected in an aspect ratio of 1:1 (i.e., a square); this type of ROI selection is denoted as single-ROI. Another type of ROI selection involved varying the size and aspect ratio of four different ROIs; this type of ROI selection is denoted by multi-ROI. In order to validate the proposed model without bias, we performed the hold out validation three times. From each group, 10 different patients were selected for each class (i.e., MCN and SCN) while the rest of the data were used as training data, as shown in Figure 3.  . Three groups of holdout validation. Ten patients were selected from each class for testing and the rest were used to provide training data.

Data Augmentation for Training Data
Data augmentation was used to increase the amount of training data and further help the generalization of the network decision. The augmentation techniques used in this study are flip, rotation, translation, zoom in, zoom out, adding Gaussian noise, and perspective transformation [23]. Perspective transformation was utilized to simulate the deformation of the cyst in the EUS image. ROI was selected in an aspect ratio of 1:1 (i.e., a square); this type of ROI selection is denoted as single-ROI. Another type of ROI selection involved varying the size and aspect ratio of four different ROIs; this type of ROI selection is denoted by multi-ROI. In order to validate the proposed model without bias, we performed the hold out validation three times. From each group, 10 different patients were selected for each class (i.e., MCN and SCN) while the rest of the data were used as training data, as shown in Figure 3.  . Three groups of holdout validation. Ten patients were selected from each class for testing and the rest were used to provide training data.

Data Augmentation for Training Data
Data augmentation was used to increase the amount of training data and further help the generalization of the network decision. The augmentation techniques used in this study are flip, rotation, translation, zoom in, zoom out, adding Gaussian noise, and perspective transformation [23]. Perspective transformation was utilized to simulate the deformation of the cyst in the EUS image. . Three groups of holdout validation. Ten patients were selected from each class for testing and the rest were used to provide training data.

Data Augmentation for Training Data
Data augmentation was used to increase the amount of training data and further help the generalization of the network decision. The augmentation techniques used in this study are flip, rotation, translation, zoom in, zoom out, adding Gaussian noise, and perspective transformation [23]. Perspective transformation was utilized to simulate the deformation of the cyst in the EUS image.
Data augmentation for the training data is illustrated in Figure 4. Examples of the augmentation process are shown in Figure 5. The augmentation process starts from a grayscale EUS image and is followed by rotation and random perspective transformation. Then, ROIs in arbitrary rectangular shapes were selected manually as shown in Figure 5B. Selected ROIs were flipped after Gaussian noise was added. Finally, each ROI was resized to act as the input of the CNN model (i.e., 244 × 244 in this study). Intensities were normalized in the range of (0, 1). The examples of the augmented ROIs are shown in Figure 5C. Given that the EUS image was in grayscale, the grayscale images were repeated for the three-color channels of the input for the CNN model.
Data augmentation for the training data is illustrated in Figure 4. Examples of the augmentation process are shown in Figure 5. The augmentation process starts from a grayscale EUS image and is followed by rotation and random perspective transformation. Then, ROIs in arbitrary rectangular shapes were selected manually as shown in Figure 5B. Selected ROIs were flipped after Gaussian noise was added. Finally, each ROI was resized to act as the input of the CNN model (i.e., 244 × 244 in this study). Intensities were normalized in the range of (0, 1). The examples of the augmented ROIs are shown in Figure  5C. Given that the EUS image was in grayscale, the grayscale images were repeated for the three-color channels of the input for the CNN model.  Data augmentation for the training data is illustrated in Figure 4. Examples of the augmentation process are shown in Figure 5. The augmentation process starts from a grayscale EUS image and is followed by rotation and random perspective transformation. Then, ROIs in arbitrary rectangular shapes were selected manually as shown in Figure 5B. Selected ROIs were flipped after Gaussian noise was added. Finally, each ROI was resized to act as the input of the CNN model (i.e., 244 × 244 in this study). Intensities were normalized in the range of (0, 1). The examples of the augmented ROIs are shown in Figure  5C. Given that the EUS image was in grayscale, the grayscale images were repeated for the three-color channels of the input for the CNN model.  The CNN model used in this study was Resnet-50 [24], which consists of four main blocks ( Figure 6). Each block contains several convolutional operations as shown in Figure 6. Resnet introduced residual learning by using a skip connection technique to overcome the vanishing and exploding gradient problem during the training of the deep layer.
Owing to the limited training data size, the transfer learning technique was employed instead of training from scratch [21,22,[25][26][27]. The idea of transfer learning is to reuse the trained weights learned from a similar task. There are several fine-tuning strategies in transfer learning, including determining whether to freeze or to train the selected layers [27]. The level of fine-tuning in transfer learning can be decided based on the similarity between the pre-training task and the target task and also the characteristics of the data to be utilized. In this study, the weights pre-trained on the ImageNet data, which consists of approximately 15 million labeled high-resolution natural images, were utilized. As the characteristics of EUS data differ from those of ImageNet data, two fine-tuning approaches with transfer learning were employed as shown in Figure 6. First, the fully connected layer (2048 × 2) was trained, followed by a softmax layer ( Figure 6A). This approach is denoted by Resnet-FC, which utilizes most of the features from the pre-trained weight. Another fine-tuning approach was implemented by freezing the first three stages in the feature selection part while the remaining stages were trained ( Figure 6B), which is denoted by ResNet-Conv+FC. This keeps low level features from the pre-trained weight but extracts high level features and classifiers from the EUS data. Both ResNet-FC and ResNet-Conv+FC were trained using the hyperparameters listed in Table 2. They were trained and tested using an Intel ® Core™ i7-9700 CPU @ 3.00 GHz processor, a CUDA-enabled Nvidia GTX 1660 Ti graphical processing unit (GPU), and Python 3.8 and PyTorch 1.7.1. Table 2. Hyperparameters for the training.

Number of Epochs 50
Batch size 32 Learning rate 1 × 10 −5 Optimizer Adam (weight decay 1 × 10 −4 ) Loss function Cross entropy blocks ( Figure 6). Each block contains several convolutional operations as shown in Figure  6. Resnet introduced residual learning by using a skip connection technique to overcome the vanishing and exploding gradient problem during the training of the deep layer.
Owing to the limited training data size, the transfer learning technique was employed instead of training from scratch [21,22,[25][26][27]. The idea of transfer learning is to reuse the trained weights learned from a similar task. There are several fine-tuning strategies in transfer learning, including determining whether to freeze or to train the selected layers [27]. The level of fine-tuning in transfer learning can be decided based on the similarity between the pre-training task and the target task and also the characteristics of the data to be utilized. In this study, the weights pre-trained on the ImageNet data, which consists of approximately 15 million labeled high-resolution natural images, were utilized. As the characteristics of EUS data differ from those of ImageNet data, two finetuning approaches with transfer learning were employed as shown in Figure 6. First, the fully connected layer (2048 × 2) was trained, followed by a softmax layer ( Figure 6A). This approach is denoted by Resnet-FC, which utilizes most of the features from the pretrained weight. Another fine-tuning approach was implemented by freezing the first three stages in the feature selection part while the remaining stages were trained ( Figure 6B), which is denoted by ResNet-Conv+FC. This keeps low level features from the pre-trained weight but extracts high level features and classifiers from the EUS data. Both ResNet-FC and ResNet-Conv+FC were trained using the hyperparameters listed in Table 2. They were trained and tested using an Intel ® Core™ i7-9700 CPU @ 3.00GHz processor, a CUDA-enabled Nvidia GTX 1660 Ti graphical processing unit (GPU), and Python 3.8 and PyTorch 1.7.1.

Grad-CAM
Although a deep learning network can be trained to classify the EUS images by differentiating MCN and SCN, it is hard to clarify how the decision was made. In this study, gradient-weighted class activation mapping (Grad-CAM) [22,28] was used to generate a heatmap visualizing the region or feature that played an important role in making decisions from any k-th feature in the network. The Grad-CAM map L c Grad−CAM can be obtained as follows: first, compute α c k , i.e., the neuron importance weights, by giving an input image to the network to obtain the score y c of a specific class C, which is either MCN or SCN in this study, as specified in Equation (1). The gradient of all other classes is set to 0, whereas the target class C is set to 1. The gradient is then calculated via the backpropagation that flows to the k-th-feature map A k over i (width) and j (height) while Z is the number of pixels in A k . Finally, we can obtain the L c Grad−CAM by a weighted combination of A k and α c k followed by a ReLU activation function as shown in Equation (2).
In this study, Grad-CAM from the last convolution layer was calculated by Equations (1) and (2).

Statistical Analyses
An Anderson-Darling test and a paired t-test were used to compare the accuracy, sensitivity, and specificity between each group of holdout validation, and to compare the metrics between the two fine-tuning approaches as classifiers, respectively. Statistical differences between AUC values were defined using two-sided DeLong tests, with p-value < 0.05 as the statistically significant level. Figure 7 shows the learning curves during training for both fine-tuning approaches. For the first 50 epochs of the training, the mean training losses of the three groups for the ResNet-FC ( Figure 7A) was 0.57 while that of the ResNet-Conv+FC ( Figure 7B) was 0.32. For Res-Net-FC, the learning curve reached a stable point after 150 epochs but the loss was still higher than 0.55. Thus, the training learning curve shows that the network can be trained successfully with ResNet-Conv+FC.

Evaluation of the Network Model
Metrics such as accuracy, sensitivity, specificity, and AUC score taken from the receiver operating characteristic (ROC) were used to evaluate the performance of the network. Figure 8 illustrates the ROC curves of both fine-tuning approaches (solid line: Res-Net-Conv+FC and dotted line: ResNet-FC) for single-ROI ( Figure 8A) and multi-ROI (Figure 8B). The Anderson-Darling test confirmed that the calculated accuracy, sensitivity,

Evaluation of the Network Model
Metrics such as accuracy, sensitivity, specificity, and AUC score taken from the receiver operating characteristic (ROC) were used to evaluate the performance of the network. Figure 8 illustrates the ROC curves of both fine-tuning approaches (solid line: ResNet-Conv+FC and dotted line: ResNet-FC) for single-ROI ( Figure 8A) and multi-ROI ( Figure 8B). The Anderson-Darling test confirmed that the calculated accuracy, sensitivity, and specificity show normal distribution. Table 3 presents the performance of the two finetuning approaches tested on single-ROI and multi-ROI data. Overall, ResNet-Conv+FC achieved higher performance in terms of accuracy (by 20.47%, p = 0.0036), sensitivity (by 20.79%, p = 0.0045), specificity (by 21.06%, p = 0.0046), and AUC score (by 22.73%) compared to ResNet-FC. The result shows that the classification performance of ResNet-Conv+FC is significantly different (p < 0.05) and superior compared to that of ResNet-FC.

Evaluation of the Network Model
Metrics such as accuracy, sensitivity, specificity, and AUC score taken from the receiver operating characteristic (ROC) were used to evaluate the performance of the network. Figure 8 illustrates the ROC curves of both fine-tuning approaches (solid line: Res-Net-Conv+FC and dotted line: ResNet-FC) for single-ROI ( Figure 8A) and multi-ROI (Figure 8B). The Anderson-Darling test confirmed that the calculated accuracy, sensitivity, and specificity show normal distribution. Table 3 presents the performance of the two fine-tuning approaches tested on single-ROI and multi-ROI data. Overall, ResNet-Conv+FC achieved higher performance in terms of accuracy (by 20.47%, p = 0.0036), sensitivity (by 20.79%, p = 0.0045), specificity (by 21.06%, p = 0.0046), and AUC score (by 22.73%) compared to ResNet-FC. The result shows that the classification performance of ResNet-Conv+FC is significantly different (p < 0.05) and superior compared to that of Res-Net-FC.   Figure 9 shows the Grad-CAM mapping obtained from ResNet-Conv+FC. Figure 9A-C are MCN cases (probabilities of the predicted class were 100%, 87.5%, and 59.8%, respectively) while Figure 9D-F are SCN cases (probabilities of the predicted class were 100%, 100%, and 50.4%, respectively). The Grad-CAM mapping emphasizes the regions affecting the decision the most from the network. Colors from blue (low) to red (high) in Grad-CAM illustrate the degree of decision contribution from various regions. It confirms that the network model learned the critical features for the classification. Although Figure 9C,F is correctly predicted, the prediction probabilities were 59.8% and 50.4% for MCN and SCN, respectively. Despite Grad-CAM mapping for Figure 9C,F showing high activation near the cyst region, the shapes of the cyst are hard to differentiate; this affected the decision. Although the network was trained without information about the position of the cysts, it was observed that the decision from the network model was made based on the features learned from the cystic region.

Gradient Visualization (Grad-CAM)
affecting the decision the most from the network. Colors from blue (low) to red (high) in Grad-CAM illustrate the degree of decision contribution from various regions. It confirms that the network model learned the critical features for the classification. Although Figure  9C,F is correctly predicted, the prediction probabilities were 59.8% and 50.4% for MCN and SCN, respectively. Despite Grad-CAM mapping for Figure 9C,F showing high activation near the cyst region, the shapes of the cyst are hard to differentiate; this affected the decision. Although the network was trained without information about the position of the cysts, it was observed that the decision from the network model was made based on the features learned from the cystic region.

Discussion
In this study, we implemented a CNN model to differentiate between MCN and SCN using EUS images. Overall accuracy was up to 82.76% and 80.00% from single-ROIs and multi-ROIs, respectively. The classification errors from the implemented network model mostly occurred in the confusing cases. It is known that the diagnosis of the macrocystic SCN using EUS images is difficult even for experienced sonographers [29]. In spite of the fact that the proposed model is not yet able achieve a level of histological diagnosis, the performance of the current study reached the level of experienced endosonographers. By utilizing the features in the images from the CNN model, the network model showed comparable performance to that of Zhang et al. [1]. When their criteria with visual features and additional information including the measured CEA level achieved the accuracy of 82.93%.

Discussion
In this study, we implemented a CNN model to differentiate between MCN and SCN using EUS images. Overall accuracy was up to 82.76% and 80.00% from single-ROIs and multi-ROIs, respectively. The classification errors from the implemented network model mostly occurred in the confusing cases. It is known that the diagnosis of the macrocystic SCN using EUS images is difficult even for experienced sonographers [29]. In spite of the fact that the proposed model is not yet able achieve a level of histological diagnosis, the performance of the current study reached the level of experienced endosonographers. By utilizing the features in the images from the CNN model, the network model showed comparable performance to that of Zhang et al. [1]. When their criteria with visual features and additional information including the measured CEA level achieved the accuracy of 82.93%.
There are several approaches to fine-tuning to train the network model [27], but two fine-tuning approaches (ResNet-FC and ResNet-Conv+FC) were shown in this study. Although it is not included in this paper, the full network was trained using initial pretrained weights and was also compared with various layers selected for training. However, they all showed worse performance than the two fine-tuning approaches selected in this study. Among the two selected fine-tuning approaches (Table 3), the network achieved higher diagnosis performance by training selected top layers (ResNet-Conv+FC) compared to the performance when the full pre-trained weights (Resnet-FC) were utilized.
Owing to the small amount of available data and varying number of samples for each patient, holdout validation was employed to choose training and test data instead of k-fold cross validation. In order to evaluate the reliability of the network, the multi-ROI cases were employed. Varying the size and region of the ROI causes a slight decrease in the overall accuracy. Considering that the EUS images were taken from two different hospitals using various kinds of EUS imaging settings and clinical configurations (e.g., radial and linear echoendoscopes), however, the results in this study shows that our approach can differentiate between MCN and SCN without dependency on the variations of EUS imaging conditions. Diagnosing various kinds of PCN remains a challenge owing to the lack of datasets and exact criteria for differentiation. Typically, unilocular or multilocular cysts with variable septations on an EUS image can be defined as MCN [30]. When a cyst has peripheral calcifications or mucin content on EUS images, it is also categorized as MCN [30,31]. When a cyst shows micro-cystic (<3 mm) honeycomb structures, it is classified as typical SCN [31].
In addition, a cyst with a central scar can also be diagnosed as SCN [30,31]. The IPMN case is especially complicated as it needs to consider surrounding tissues and has a high chance of developing multiple cysts, which requires several EUS images to make the decision rather than just a single image. Thus, further investigation is necessary to improve the current network model to work with multiple input images to differentiate various kinds of PCN cases. Owing to the scanning condition of the endoscopy, an EUS can have varying distances from the surface of the imaging target and each imaging frame encounters huge variations. Sequential frames in the EUS video can be utilized to overcome this issue. Only two kinds of PCNs were diagnosed in this study because of the limitations of the dataset, but the results in this study indicate the possibility of expanding the CNN model for the differentiation of PCNs from EUS images. Although the performance of the network was limited by the small amount of data in this study, the network will be further improved to differentiate various kinds of PCN cases.

Conclusions
The implemented network provides reliable performance in differentiating between MCN and SCN in terms of accuracy, sensitivity, and AUC score with verification from Grad-cam mapping. In the future, we will further improve the network to differentiate various kinds of PCN cases and also difficult cases using increased amounts of EUS images and video data for the network training.

Institutional Review Board Statement:
This study was conducted in accordance with the guidelines of the Declaration of Helsinki, and was approved by the Institutional Review Boards of Inha University Hospital (2020-05-002, approval date: 18 May 2020) and of Asan Medical Center (2020-1290-0001, approval date: 19 November 2020).