AI-Based Prediction of Post-ERCP Pancreatitis: A Comparative Study Using Tabular, Image, and Multimodal Data
Abstract
1. Introduction
Literature Review
- Create a multimodal artificial intelligence framework using 2D images of the ampulla and clinical data to integrate PEP prediction.
- Analyze the imaging and clinical data, both independently and in conjunction, for their predictive contributions.
- Improve the safety of the procedure and the quality of the care provided to patients by providing clinically relevant insights.
2. Methodology
2.1. System Architecture
2.2. Classification of Tabular Data Using XGBoost
Training and Evaluation
2.3. Image-Based Classification Using EfficientNet-B0, DenseNet-201, and ResNet-50
2.3.1. Data Augmentation and Class Balancing
2.3.2. Model Architectures
2.3.3. Training and Evaluation
2.4. Multimodal Fusion Using Multimodal Contrastive Learning (MMCL) [9]
2.4.1. Self-Supervised Pretraining Phase
- Image Encoder: A ResNet50 backbone pretrained on ImageNet was used to extract image features. The final fully connected layer was removed, producing 2048-dimensional embeddings from endoscopic images resized to 128 × 128 pixels.
- Tabular Encoder: Structured clinical variables (28 features) were processed using a two-layer MLP-based encoder with batch normalization and ReLU activation, producing 2048-dimensional embeddings.
- Projection Heads: Separate SimCLR projection heads were applied to both modalities, each consisting of a linear layer, ReLU activation, and a final linear layer mapping embeddings to a 128-dimensional latent space.
- Contrastive Learning Objective: A contrastive loss was used to maximize similarity between matched image–tabular pairs from the same patient while minimizing similarity between unmatched pairs.
2.4.2. Supervised Fine-Tuning
- Feature Fusion: Image and tabular embeddings (2048 dimensions each) were concatenated to form a 4096-dimensional fused feature vector.
- Classification Head: The fused representation was passed through a fully connected layer for binary classification.
- Loss Function: Cross-entropy loss was used for optimization.
- Optimization: Training was performed using the Adam optimizer with learning-rate scheduling.
2.4.3. Multimodal Training and Evaluation
2.5. Outcome Labeling Strategy
3. Dataset Description
3.1. Time Period
3.2. Inclusion and Exclusion Criteria
3.2.1. Inclusion Criteria
- Patients undergoing ERCP at SIAG with available ampulla images.
- Availability of clinical and procedural data for structured (tabular) analysis.
- Patients with complete follow-up records confirming or ruling out PEP.
3.2.2. Exclusion Criteria
- Patients who have ampulla images that are of poor quality or images that are missing.
- Clinical data that is incomplete or missing.
- Patients who have a history of pancreatic disease that may confound the diagnosis of PEP.
- Patients who have a history of ERCP.
3.3. Data Composition and Statistics
3.3.1. Tabular Data
- Easy,
- Needed Extra Technique,
- Took More Than 5 min,
- Failed.
3.3.2. Image Data
3.3.3. Multimodal Data
3.4. Data Preprocessing
3.4.1. Tabular Data
3.4.2. Image Data
3.4.3. Multimodal Data
- A total of 3 files were saved to be used as input. Each of these three types of files is further divided into train, test, and validation sets. One file contains the labels in .pt format.
- Another file contains all tabular features.
- Third files contain image paths in .pt format.
4. Results
4.1. Evaluation Metrics
- Accuracy: proportion of instances identified and classified correctly,
- Precision: positive cases identified correctly by the model,
- Recall (Sensitivity): the model’s ability to identify all of the actual positive instances,
- F1 Score: precision and recall’s harmonic mean,
- Confusion Matrix: method for performance of class-oriented prediction.
4.2. Result on Tabular Data
- Train–test split, with 80% of the data used for training and 20% for testing.
- 5-fold cross-validation to ensure more reliable evaluation.
4.2.1. Train and Test Split
- High Cannulation Time values are associated with a higher likelihood of PEP.
- Age, when high to moderate, also contributes to a ‘Yes’ prediction.
- Easy cannulation is linked with ‘Yes’ predictions.
- Low bilirubin levels are associated with ‘Yes’ predictions.
- Features listed near the bottom of the y-axis appear to have minimal impact on the prediction.
4.2.2. Cross Validation
4.3. Results on Image Data
4.3.1. Training and Testing Loss and Accuracy with Data Augmentation
4.3.2. Training and Testing Loss and Accuracy with Data Balancing
4.3.3. Summarize Evaluation Results
- The recall and F1 scores of EfficientNet-B0 improved significantly for Class 1 in the balanced setting for PEP prediction.
- With augmentation, ResNet50 had the highest accuracy overall at 0.88, but had poor predictions for Class 1.
- The recall and F1 scores of DenseNet201 were poor for Class 1 and, therefore, in this study, this model was the worst model for identifying PEP cases.
- Despite applying balancing and augmentation strategies, several image-based models continued to show low sensitivity for Class 1, likely due to the limited number of positive PEP cases and the exploratory nature of the dataset.
4.4. Results of Multimodal Fusion
Observations
- The multimodal model demonstrated positive potential in identifying non-PEP cases (Class 0) with a recall of 0.93. The model correctly captured 14 of 15 non-PEP cases.
- In contrast, positive PEP cases (Class 1) are more difficult to identify. For the 5 PEP cases, the model identified only 1. Thus, it has a Class 1 recall of 0.20.
- This situation reflects the model’s surplus of specificity (classifying non-PEP cases), while the model also underperforms in sensitivity by failing to classify true PEP cases.
- The reduced precision and F1 score for class 1 posited the fact that, in order to draw sufficient learning from a class, a sufficient quantity of minority cases was required for learning from the class.
5. Discussion
5.1. Comparison of Proposed Approaches
- Tabular XGBoost models achieved the highest overall accuracy and the greatest performance and strongest F1’s across Class 1 and Class 2.
- The image-based models (EfficientNet, ResNet50, DenseNet201) failed to detect Class 1 (PEP) unless the data was balanced, and DenseNet201 failed to detect Class 1 altogether in the augmentation function.
- The dataset was improved through balancing to increase the recall and F1 for Class 1, especially with EfficientNet-B0 (F1 = 0.77).
- The multimodal model has shown potential for integrating image data and tabular data, but additional tuning is necessary to increase the recall for Class 1 (PEP) cases.
5.2. Comparison with Prior Work (Based on Tabular AUC)
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ERCP | Endoscopic Retrograde Cholangiopancreatography |
| PEP | Post ERCP Pancreatitis |
| CNN | Convolutional Neural Network |
| SHAP | SHapley Additive exPlanations |
| AUC | Area Under the Curve |
| SimCLR | Simple Framework for Contrastive Learning of Visual Representations |
| NT-Xent | Normalized Temperature-Scaled Cross-Entropy Loss |
| NPV | Negative Predictive Value |
| GBM | Gradient Boosting Machine |
| XGBoost | Extreme Gradient Boosting |
| MLP | Multilayer Perceptron |
| CLIP | Contrastive Learning Image Pertaining |
| SIAG | Sindh Institute of Advanced Endoscopy and Gastroenterology |
| MMCL | Multimodal Contrastive Learning |
| ROCs | Receiver Operating Characteristics |
References
- De, T.; Du, G.; Yin, H.; Wang, H.; Wang, W.; Ma, T.; Ma, J.; Wang, H.; Wang, Q. Development and validation of a practical prediction model for post-ERCP pancreatitis using machine learning. Front. Surg. 2025, 12, 1628956. [Google Scholar] [CrossRef] [PubMed]
- Cao, X.X.; Sun, M. Prediction model for the occurrence of acute pancreatitis after endoscopic retrograde cholangiopancrea-tography based on multidimensional indicators. World J. Gastrointest. Surg. 2025, 17, 111003. [Google Scholar] [CrossRef]
- Testoni, P.A.; Mariani, A.; Aabakken, L.; Arvanitakis, M.; Bories, E.; Costamagna, G.; Devière, J.; Dinis-Ribeiro, M.; Dumonceau, J.-M.; Giovannini, M.; et al. Papillary cannulation and sphincter-otomy techniques at ERCP: European Society of Gastrointestinal Endoscopy (ESGE) Clinical Guideline. Endoscopy 2016, 48, 657–683. [Google Scholar]
- Archibugi, L.; Ciarfaglia, G.; Cárdenas-Jaén, K.; Poropat, G.; Korpela, T.; Maisonneuve, P.; Aparicio, J.R.; Casellas, J.A.; Arcidiacono, P.G.; Mariani, A.; et al. Machine learning for the prediction of post-ERCP pancreatitis risk: A proof-of-concept study. Dig. Liver Dis. 2023, 55, 387–393. [Google Scholar] [CrossRef]
- Takahashi, H.; Ohno, E.; Furukawa, T.; Yamao, K.; Ishikawa, T.; Mizutani, Y.; Iida, T.; Shiratori, Y.; Oyama, S.; Koyama, J.; et al. Artificial intelligence in a prediction model for post-ERCP pancreatitis. Dig. Endosc. Off. J. Jpn. Gastroenterol. Endosc. Soc. 2024, 36, 463–472. [Google Scholar]
- Chen, K.; Lin, H.; Zhang, F.; Chen, Z.; Ying, H.; Cao, L.; Fang, J.; Zhu, D.; Liang, K. Duodenal papilla radiomics-based prediction model for post-ERCP pancreatitis using machine learning: A retrospective multicohort study. Gastrointest. Endosc. 2024, 100, 691–702. [Google Scholar] [CrossRef]
- Kim, T.; Kim, J.; Choi, H.S.; Lee, H.S.; Kim, E.S.; Lee, J.M.; Lee, H.S.; Chun, H.J.; Han, S.Y.; Keum, B.; et al. Artificial intelligence-assisted analysis of endoscopic retrograde cholangiopancreatography image for identifying ampulla and difficulty of selective cannulation. Sci. Rep. 2021, 11, 8381. [Google Scholar] [CrossRef]
- Haraldsson, E.; Lundell, L.; Swahn, F.; Enochsson, L.; Löhr, J.M.; Arnelo, U. Scandinavian Association for Digestive Endos-copy (SADE) Study Group of Endoscopic Retrograde Cholangio-Pancreaticography. Endoscopic classification of the papilla of Vater: Results of an inter- and intraobserver agreement study. United Eur. Gastroenterol. J. 2017, 5, 504–510. [Google Scholar] [CrossRef]
- Hager, P.; Menten, M.J.; Rueckert, D. Best of both worlds: Multimodal contrastive learning with tabular and imaging data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 23924–23935. [Google Scholar]
- Wolf, T.N.; Pölsterl, S.; Wachinger, C. Alzheimer’s Disease Neuroimaging Initiative. DAFT: A universal module to inter-weave tabular data and 3D images in CNNs. NeuroImage 2022, 260, 119505. [Google Scholar] [CrossRef] [PubMed]
- Vajrangi, S.; Bidargaddi, A.P.; Yeli, V.; Selar, S.; Hiremath, R. Harnessing Effectiveness of ResNet-50 and EfficientNet for Few-Shot Learning. Int. J. Comput. Commun. Inform. 2023, 5, 46–55. [Google Scholar] [CrossRef]
- Zhang, X.; Yue, P.; Zhang, J.; Yang, M.; Chen, J.; Zhang, B.; Luo, W.; Wang, M.; Da, Z.; Lin, Y.; et al. A novel machine learning model and a public online prediction platform for prediction of post-ERCP-cholecystitis (PEC). EClinicalMedicine 2022, 48, 101431. [Google Scholar] [CrossRef]
- Brenner, T.; Kuo, A.; Weiland, C.J.; Kamal, A.; Elmunzer, B.J.; Luo, H.; Buxbaum, J.; Gardner, T.B.; Mok, S.S.; Fogel, E.S.; et al. Development and validation of a machine learning–based, point-of-care risk calculator for post-ERCP pancreatitis and prophylaxis selection. Gastrointest. Endosc. 2025, 101, 129–138. [Google Scholar] [CrossRef]
- Jin, H.; Sun, X.; Fu, C.; Fan, C.; Chen, J.; Zhang, Z.; Yang, Y.; Fan, X.; He, Y.; Yin, S.; et al. Machine learning-based predic-tion model for post-ERCP cholangitis in patients with malignant biliary obstruction: A retrospective multicenter study. Surg. Endosc. 2025, 39, 5107–5126. [Google Scholar] [CrossRef]
- Sabrie, N.; Minhas, G.; Vaska, M.; Meng, Z.W.; Brenner, D.R.; Forbes, N. Performance of clinical risk prediction models for post-ERCP pancreatitis: A systematic review. Pancreas 2025, 54, e588–e595. [Google Scholar] [CrossRef] [PubMed]
- Andrzej, J. Impact of the shape of the ampulla of Vater and biliary tree pathology on access technique and post-ERCP complications. Res. Sq. 2024. [Google Scholar] [CrossRef]
- Takiyama, H.; Ozawa, T.; Uraoka, T.; Tamaki, T.; Okada, H. Automatic anatomical classification of esophagogastroduodenoscopy images using deep convolutional neural networks. Sci. Rep. 2018, 8, 7497. [Google Scholar] [CrossRef] [PubMed]
- Hirasawa, T.; Aoyama, K.; Tanimoto, T.; Ishihara, S.; Shichijo, S.; Ozawa, T.; Ohnishi, T.; Fujishiro, M.; Matsuo, K.; Fujisaki, J.; et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer 2018, 21, 653–660. [Google Scholar] [CrossRef]
- Su, C.; Wang, W. Concrete cracks detection using convolutional neural network based on transfer learning. Math. Probl. Eng. 2020, 2020, 7240129. [Google Scholar] [CrossRef]
- Al-Humaidan, N.A.; Prince, M. A classification of arab ethnicity based on face image using deep learning approach. IEEE Access 2021, 9, 50755–50766. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 2980–2988. [Google Scholar]
- Khan, M.A.; Rahman, A.; Saba, T.; Rehman, A.; AksamIftikhar, M.; Sharif, M. Automatic detection of tympanic membrane and middle ear infection from oto-endoscopic images via convolutional neural networks. Neural Netw. 2020, 126, 384–394. [Google Scholar] [CrossRef] [PubMed]
- Wang, G.; Zuluaga, M.A.; Li, W.; Pratt, R.; Patel, P.A.; Aertsen, M.; Doel, T.; David, A.L.; Deprest, J.; Ourselin, S.; et al. Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans. Med. Imaging 2018, 37, 1562–1573. [Google Scholar] [CrossRef] [PubMed]
- Nie, D.; Zhang, H.; Adeli, E.; Liu, L.; Shen, D. 3D deep learning for multi-modal imaging-guided survival time prediction of brain tumor patients. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016; Springer: Cham, Switzerland, 2016; pp. 212–220. [Google Scholar]
- Fu, H.; Xu, Y.; Lin, S.; Wong, D.W.K.; Liu, J. DeepVessel: Retinal vessel segmentation via deep learning and conditional random field. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016; Springer: Cham, Switzerland, 2016; pp. 132–139. [Google Scholar]
- Bakr, M.; Abdel-Gaber, S.; Nasr, M.; Hazman, M. DenseNet based model for plant diseases diagnosis. Eur. J. Electr. Eng. Comput. Sci. 2022, 6, 1–9. [Google Scholar] [CrossRef]
- Chen, K.; Wang, L.; Wang, X.; Yang, L.; Zhang, X.; Lin, Y.; Cao, L. Machine learning-derived predictive model for post-ERCP pancreatitis in patients with common bile duct stones: A retrospective multicenter study. Surg. Endosc. 2025, 39, 8171–8183. [Google Scholar] [CrossRef]
- Cahyadi, O.; Tehami, N.; de-Madaria, E.; Siau, K. Post-ERCP pancreatitis: Prevention, diagnosis and management. Medicina 2022, 58, 1261. [Google Scholar] [CrossRef]
- Arabpour, E.; Sadeghi, A.; Rastegar, R.; Mohammadi, P.; Safavi-Naini, S.A.; Moghadam, P.K.; Zali, M.R. Predicting Post-ERCP Pancreatitis Using Machine Learning: Risk Stratification and Feature Importance Analysis. J. Hepato-Biliary-Pancreat. Sci. 2026; Epub ahead of printing. [CrossRef]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 8789–8797. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019; pp. 4401–4410. [Google Scholar]

















| Precision | Recall | F1-Score | Support | |
|---|---|---|---|---|
| 0 | 0.97 | 1.00 | 0.98 | 84 |
| 1 | 1.00 | 0.50 | 0.67 | 6 |
| accuracy | 0.97 | 90 | ||
| macro avg | 0.98 | 0.75 | 0.82 | 90 |
| weighted avg | 0.97 | 0.97 | 0.96 | 90 |
| Precision | Recall | F1-Score | Support | |
|---|---|---|---|---|
| 0 | 0.97 | 0.97 | 0.97 | 419 |
| 1 | 0.52 | 0.48 | 0.50 | 27 |
| accuracy | 0.94 | 446 | ||
| macro avg | 0.74 | 0.73 | 0.73 | 446 |
| weighted avg | 0.94 | 0.94 | 0.94 | 446 |
| Approach | Model | Acc | Prec (C0) | Prec (C1) | Rec (C0) | Rec (C1) | F1 (C0) | F1 (C1) |
|---|---|---|---|---|---|---|---|---|
| Images Augmentation | Efficient B0 | 0.83 | 0.94 | 0.10 | 0.88 | 0.20 | 0.91 | 0.13 |
| Image Balance | 0.70 | 1.00 | 0.62 | 0.40 | 1.00 | 0.57 | 0.77 | |
| Images Augmentation | Resnet50 | 0.88 | 0.96 | 0.25 | 0.92 | 0.40 | 0.94 | 0.31 |
| Image Balance | 0.50 | 0.50 | 0.50 | 0.40 | 0.60 | 0.44 | 0.55 | |
| Images Augmentation | Densenet201 | 0.82 | 0.93 | 0.00 | 0.88 | 0.00 | 0.00 | 0.90 |
| Image Balance | 0.5 | 0.00 | 0.50 | 0.00 | 1.00 | 0.00 | 0.67 |
| Precision | Recall | F1-Score | Support | |
|---|---|---|---|---|
| Class0 | 0.78 | 0.93 | 0.85 | 15 |
| Class1 | 0.50 | 0.20 | 0.29 | 5 |
| Accuracy | 0.75 | 20 | ||
| Macro avg | 0.64 | 0.57 | 0.57 | 20 |
| Weighted avg | 0.71 | 0.75 | 0.71 | 20 |
| Approach | Model | Acc | Prec (C0) | Prec (C1) | Rec (C0) | Rec (C1) | F1 (C0) | F1 (C1) |
|---|---|---|---|---|---|---|---|---|
| Tabular | XGBoost | 0.97 | 0.97 | 1.00 | 1.00 | 0.50 | 0.98 | 0.67 |
| Tabular CV | 0.94 | 0.97 | 0.52 | 0.97 | 0.48 | 0.97 | 0.50 | |
| Images Augmentation | Efficient B0 | 0.83 | 0.94 | 0.10 | 0.88 | 0.20 | 0.91 | 0.13 |
| Image Balance | 0.70 | 1.00 | 0.62 | 0.40 | 1.00 | 0.57 | 0.77 | |
| Images Augmentation | Resnet50 | 0.88 | 0.96 | 0.25 | 0.92 | 0.40 | 0.94 | 0.31 |
| Image Balance | 0.50 | 0.50 | 0.50 | 0.40 | 0.60 | 0.44 | 0.55 | |
| Images Augmentation | Densenet201 | 0.82 | 0.93 | 0.00 | 0.88 | 0.00 | 0.00 | 0.90 |
| Image Balance | 0.5 | 0.00 | 0.50 | 0.00 | 1.00 | 0.00 | 0.67 | |
| Mutimodal | Resnet50 + Multilayer Perceptron | 0.75 | 0.78 | 0.50 | 0.93 | 0.20 | 0.85 | 0.29 |
| Study | Year | Dataset Size | Model | Data Type | Recall (CV) | AUC (CV) | AUC (Test) |
|---|---|---|---|---|---|---|---|
| Proposed (This Study) | 2026 | 446 patients (Tabular) | XGBoost | Clinical Tabular | 0.48 | 0.786 | 0.95 |
| Prior Study [30] | 2026 | 1190 patients | CatBoost | Clinical Tabular | 0.74 | 0.688 | - |
| Prior Study [13] | 2025 | 7389 patients | GBM | Clinical Tabular | 0.65 | 0.70 | 0.74 |
| Prior Study [4] | 2023 | 1150 patients | Gradient Boosting | Clinical Tabular | - | 0.70 | 0.671 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jamil, A.; Nazir, W.; Altaf, A.; Niaz, S.K. AI-Based Prediction of Post-ERCP Pancreatitis: A Comparative Study Using Tabular, Image, and Multimodal Data. Diagnostics 2026, 16, 1824. https://doi.org/10.3390/diagnostics16121824
Jamil A, Nazir W, Altaf A, Niaz SK. AI-Based Prediction of Post-ERCP Pancreatitis: A Comparative Study Using Tabular, Image, and Multimodal Data. Diagnostics. 2026; 16(12):1824. https://doi.org/10.3390/diagnostics16121824
Chicago/Turabian StyleJamil, Anum, Waseemullah Nazir, Abeer Altaf, and Saad Khalid Niaz. 2026. "AI-Based Prediction of Post-ERCP Pancreatitis: A Comparative Study Using Tabular, Image, and Multimodal Data" Diagnostics 16, no. 12: 1824. https://doi.org/10.3390/diagnostics16121824
APA StyleJamil, A., Nazir, W., Altaf, A., & Niaz, S. K. (2026). AI-Based Prediction of Post-ERCP Pancreatitis: A Comparative Study Using Tabular, Image, and Multimodal Data. Diagnostics, 16(12), 1824. https://doi.org/10.3390/diagnostics16121824

