Explainable Deep Learning for Endometriosis Classification in Laparoscopic Images
Abstract
1. Introduction
1.1. Overview of Endometriosis
1.2. Research Gaps
1.3. Aims and Contributions of the Study
2. Literature Review
2.1. Machine Learning-Based Studies on Endometriosis
2.2. Application of Explainability Methods in Medical Imaging
3. Materials and Methods
3.1. GLENDA Dataset
3.2. Data Splitting
3.3. Data Preprocessing
- Resizing: All images were resized to a uniform input resolution using bicubic interpolation [27]. Standardizing spatial dimensions is necessary for compatibility across CNN- and Transformer-based architectures and to ensure that anatomical detail is preserved consistently.
- Random Horizontal Flip (p = 0.5): Introduced to simulate left–right anatomical variability and enhance spatial invariance.
- Random Rotation (±10°): Used to reduce sensitivity to minor angular deviations resulting from surgical camera motion.
- Color Jitter (brightness/contrast = 0.1): Applied to mimic lighting variations across different laparoscopic systems, improving the model’s robustness to illumination changes [28].
- Normalization: After conversion to tensors, image intensities were normalized channel-wise to the range [−1, 1] using a mean and standard deviation of 0.5. This step accelerates convergence and stabilizes gradient updates during training.
3.4. Model Architectures
3.5. Model Training and Optimization
- Standard Cross-Entropy (CE): The baseline loss for classification tasks [38].
- Weighted Cross-Entropy: Incorporating inverse frequency class weights per fold to upweight the minority (pathological) class [39].
- Focal Loss (γ = 2): Downweights easy majority examples and focuses training on hard or misclassified samples [39].
3.6. Model Explainability
4. Results
4.1. Comparison of Model Performance
4.2. Effect of Loss Function
4.3. Effect of Oversampling Abnormal Images
4.4. Effect of Normal-to-Abnormal Sampling Ratio
4.5. Explainability Analysis
5. Discussion
5.1. Model Performance and Comparison
5.2. Effectiveness of Imbalance Mitigation Strategies
5.3. Explainability and Interpretability
6. Future Work
- Dataset expansion and diversity: The GLENDA dataset provides a valuable foundation but remains limited in scale and class balance. Expanding data collection to include a larger and more heterogeneous cohort from multiple centers, covering varied anatomical regions, imaging settings, and disease severities, would help improve model robustness and generalization [48]. Future datasets should also aim to reduce potential sources of bias arising from differences in laparoscopic equipment and the absence of demographic metadata, thereby enhancing fairness and representativeness in model evaluation.
- Incorporating spatial supervision: The current models were trained with image-level labels only. Introducing weakly or semi-supervised learning techniques, such as pseudo-mask generation or region-based attention constraints, could guide the model to focus more precisely on lesion-relevant areas while maintaining classification performance [49].
- Refining explainability evaluation: While this study quantitatively assessed Grad-CAM outputs using IoU, Dice, and Recall, these metrics remain limited proxies for clinical interpretability. Future work should involve systematic evaluation with domain experts to determine whether highlighted regions correspond to diagnostically meaningful cues and to develop more clinically grounded explainability metrics [44].
- External validation and clinical integration. The current framework has not yet been validated on independent datasets. Cross-institutional testing and real-world deployment studies are essential to ensure reliability under diverse imaging conditions and to evaluate how such models could integrate into surgical decision-support systems. In practice, explainable AI tools should complement, rather than replace, clinician judgment in the diagnostic process.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| CNN | Convolutional Neural Network |
| ViT | Vision Transformer |
| DL | Deep Learning |
| IoU | Intersection over Union |
| SHAP | SHapley Additive exPlanations |
| LIME | Local Interpretable Model-Agnostic Explanations |
| Grad-CAM | Gradient-weighted Class Activation Mapping |
| GLENDA | Gynecologic Laparoscopy Endometriosis Dataset |
| SDTA | Split Depth-wise Transpose Attention |
References
- World Health Organization. Endometriosis. WHO Fact Sheets. 2023. Available online: https://www.who.int/news-room/fact-sheets/detail/endometriosis (accessed on 1 June 2025).
- Zondervan, K.T.; Becker, C.M.; Missmer, S.A. Endometriosis. N. Engl. J. Med. 2020, 382, 1244–1256. [Google Scholar] [CrossRef] [PubMed]
- Giudice, L.C. Clinical Practice. Endometriosis. N. Engl. J. Med. 2010, 362, 2389–2398. [Google Scholar] [CrossRef] [PubMed]
- Johnson, N.P.; Hummelshoj, L.; World Endometriosis Society Montpellier Consortium; Abrao, M.S.; Adamson, G.D.; Allaire, C.; Amelung, V.; Andersson, E.; Becker, C.; Birna Árdal, K.B.; et al. World Endometriosis Society Montpellier Consortium. Consensus on Current Management of Endometriosis. Hum. Reprod. 2013, 28, 1552–1568. [Google Scholar] [CrossRef] [PubMed]
- Chapron, C.; Marcellin, L.; Borghese, B.; Santulli, P. Rethinking Mechanisms, Diagnosis and Management of Endometriosis. Nat. Rev. Endocrinol. 2019, 15, 666–682. [Google Scholar] [CrossRef]
- Agarwal, S.K.; Chapron, C.; Giudice, L.C.; Laufer, M.R.; Leyland, N.; Missmer, S.A.; Singh, S.S.; Taylor, H.S. Clinical Diagnosis of Endometriosis: A Call to Action. Am. J. Obstet. Gynecol. 2019, 220, 354.e1–354.e12. [Google Scholar] [CrossRef]
- Missmer, S.A.; Tu, F.F.; Sanjay, K.A.; Chapron, C.; Soliman, A.M.; Chiuve, S.; Eichner, S.; Flores-Caldera, I.; Horne, A.W.; Kimball, A.B.; et al. Impact of endometriosis on life-course potential: A narrative review. Reprod. Sci. 2021, 14, 9–25. [Google Scholar] [CrossRef]
- Veth, V.B.; Keukens, A.; Reijs, A.; Bongers, M.Y.; Mijatovic, V.; Coppus, S.F.; Maas, J.W. Recurrence after surgery for endometrioma: A systematic review and meta-analyses. Fertil. Steril. 2024, 122, 1079–1093. [Google Scholar] [CrossRef]
- Dunselman, G.A.J.; Vermeulen, N.; Becker, C.; Calhaz-Jorge, C.; D’Hooghe, T.; De Bie, B.; Heikinheimo, O.; Horne, A.W.; Kiesel, L.; Nap, A.; et al. ESHRE Guideline: Management of Women with Endometriosis. Hum. Reprod. 2014, 29, 400–412. [Google Scholar] [CrossRef]
- Chen, H.; Gomez, C.; Huang, C.-M.; Unberath, M. Explainable medical imaging AI needs human-centered design: Guidelines and evidence from a systematic review. npj Digit. Med. 2022, 5, 156. [Google Scholar] [CrossRef]
- Salahuddin, Z.; Woodruff, H.C.; Chatterjee, A.; Lambin, P. Transparency of deep neural networks for medical image analysis: A review of interpretability methods. Comput. Biol. Med. 2022, 140, 105111. [Google Scholar] [CrossRef]
- Nifora, C.; Chasapi, L.; Chasapi, M.K.; Koutsojannis, C. Deep Learning Improves Accuracy of Laparoscopic Imaging Classification for Endometriosis Diagnosis. J. Clin. Med. Surg. 2023, 4, 1137–1145. [Google Scholar] [CrossRef]
- Visalaxi, S.; Muthu, T.S. Automated Prediction of Endometriosis Using Deep Learning. Int. J. Nonlinear Anal. Appl. 2021, 12, 2403–2416. [Google Scholar] [CrossRef]
- Batić, D.; Holm, F.; Özsoy, E.; Czempiel, T.; Navab, N. EndoViT: Pretraining vision transformers on a large collection of endoscopic images. Int. J. Comput. Assist. Radiol. Surg. 2024, 19, 1085–1091. [Google Scholar] [CrossRef] [PubMed]
- Guerriero, S.; Pascual, M.; Ajossa, S.; Neri, M.; Musa, E.; Graupera, B.; Rodriguez, I.; Alcazar, J.L. Artificial intelligence (AI) in the detection of rectosigmoid deep endometriosis. Eur. J. Obstet. Gynecol. Reprod. Biol. 2021, 261, 29–33. [Google Scholar] [CrossRef]
- Bendifallah, S.; Puchar, A.; Suisse, S.; Delbos, L.; Poilblanc, M.; Descamps, P.; Golfier, F.; Touboul, C.; Dabi, Y.; Daraï, E. Machine learning algorithms as new screening approach for patients with endometriosis. Sci. Rep. 2022, 12, 639. [Google Scholar] [CrossRef]
- Akter, S.; Xu, D.; Nagel, S.C.; Bromfield, J.J.; Pelch, K.E.; Wilshire, G.B.; Joshi, T. GenomeForest: An ensemble machine learning classifier for endometriosis. AMIA Jt. Summits Transl. Sci. Proc. 2020, 2020, 33–42. [Google Scholar] [PubMed] [PubMed Central]
- Zhang, H.; Zhang, H.; Yang, H.; Shuid, A.N.; Sandai, D.; Chen, X. Machine learning-based integrated identification of predictive combined diagnostic biomarkers for endometriosis. Front. Genet. 2023, 14, 1290036. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar] [CrossRef]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; IEEE: Venice, Italy, 2017; pp. 618–626. [Google Scholar] [CrossRef]
- Bhandari, M.; Yogarajah, P.; Kavitha, M.S.; Condell, J. Exploring the capabilities of a lightweight CNN model in accurately identifying renal abnormalities: Cysts, stones, and tumors, using LIME and SHAP. Appl. Sci. 2023, 13, 3125. [Google Scholar] [CrossRef]
- Aldughayfiq, B.; Ashfaq, F.; Jhanjhi, N.Z.; Humayun, M. Explainable AI for retinoblastoma diagnosis: Interpreting deep learning models with LIME and SHAP. Diagnostics 2023, 13, 1932. [Google Scholar] [CrossRef]
- Moujahid, H.; Cherradi, B.; Al-Sarem, M.; Bahatti, L.; Eljialy, A.B.A.M.Y.; Alsaeedi, A.; Saeed, F. Combining CNN and Grad-CAM for COVID-19 disease prediction and visual explanation. Intell. Autom. Soft Comput. 2021, 32, 1235–1249. [Google Scholar] [CrossRef]
- Dovletov, G.; Pham, D.D.; Lörcks, S.; Pauli, J.; Gratz, M.; Quick, H.H. Grad-CAM guided U-Net for MRI-based pseudo-CT synthesis. In Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 2071–2075. [Google Scholar] [CrossRef]
- Leibetseder, A.; Kletz, S.; Schoeffmann, K.; Keckstein, S.; Keckstein, J. GLENDA: Gynecologic Laparoscopy Endometriosis Dataset. In MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, 5–8 January 2020; Proceedings, Part II; Ro, Y.M., Cheng, W.-H., Kim, J., Chu, W.-T., Cui, P., Choi, J.-W., Hu, M.-C., De Neve, W., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 11962, pp. 439–450. [Google Scholar] [CrossRef]
- Gonzalez, R.C.; Woods, R.E. Digital Image Processing; Prentice Hall: Upper Saddle River, NJ, USA, 2003. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Wightman, R. PyTorch Image Models (timm). 2019. Available online: https://github.com/huggingface/pytorch-image-models (accessed on 1 June 2025).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Maaz, M.; Khan, S.; Khan, F.S.; Van Gool, L. EdgeNeXt: Efficiently amalgamated CNN-transformer architecture for mobile vision applications. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. (NeurIPS) 2014, 2, 3320–3328. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Abraham, N.; Khan, N.M. A novel focal tversky loss function with improved attention u-net for lesion segmentation. In Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; IEEE: Venice, Italy, 2019; pp. 683–687. [Google Scholar]
- Wang, S.; Liu, W.; Wu, J.; Cao, L.; Meng, Q.; Kennedy, P.J. Training deep neural networks on imbalanced data sets. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; IEEE: Vancouver, BC, Canada, 2016; pp. 4368–4374. [Google Scholar]
- Park, S.H.; Han, K.; Jang, H.Y.; Park, J.E.; Lee, J.-G.; Kim, D.W.; Choi, J. Methods for clinical evaluation of artificial intelligence algorithms for medical diagnosis. Radiology 2023, 306, 20–31. [Google Scholar] [CrossRef]
- Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and Explainability of Artificial Intelligence in Medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef] [PubMed]
- Leibetseder, A.; Schoeffmann, K.; Keckstein, J.; Keckstein, S. Endometriosis detection and localization in laparoscopic gynecology. Multimed. Tools Appl. 2022, 81, 6191–6215. [Google Scholar] [CrossRef]
- Netter, A.; Noorzadeh, S.; Duchateau, F.; Abrao, H.; Canis, M.; Bartoli, A.; Bourdel, N.; Desternes, J.; Peyras, J.; Pouly, J.L.; et al. Initial results in the automatic visual recognition of endometriosis lesions by artificial intelligence during laparoscopy: A proof-of-concept study. J. Minim. Invasive Gynecol. 2025. [Google Scholar] [CrossRef] [PubMed]
- Montavon, G.; Samek, W.; Müller, K.-R. Methods for Interpreting and Understanding Deep Neural Networks. Digit. Signal Process. 2018, 73, 1–15. [Google Scholar] [CrossRef]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A Survey on Deep Learning in Medical Image Analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar] [CrossRef]

| Metric | EfficientNet-B2 | EdgeNeXt_Small | ResNet50 | ViT-Small/16 |
|---|---|---|---|---|
| Top-1 Accuracy (%) | 82.38 | 81.56 | 76.00 | 81.39 |
| Top-5 Accuracy (%) | 96.25 | 95.71 | 93.00 | 96.14 |
| Top-1 Error (%) | 17.62 | 18.44 | 24.00 | 18.61 |
| Top-5 Error (%) | 3.75 | 4.29 | 7.00 | 3.86 |
| Parameter Count (M) | 9.11 | 5.59 | 25.60 | 22.05 |
| Input Image Size | 320 | 320 | 320 | 224 |
| Model | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| EdgeNeXt_Small | 97.86 | 97.90 | 97.85 | 97.86 | 99.62 |
| EfficientNet-B2 | 97.45 | 98.64 | 96.24 | 97.40 | 99.73 |
| ResNet50 | 96.91 | 99.19 | 94.63 | 96.83 | 99.32 |
| ViT-Small/16 | 95.30 | 94.71 | 95.97 | 95.33 | 99.11 |
| Model | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| EdgeNeXt_Small | 96.49 | 97.92 | 95.10 | 96.43 | 99.46 |
| EfficientNet-B2 | 96.17 | 98.36 | 93.97 | 96.06 | 99.49 |
| ResNet50 | 95.26 | 97.87 | 92.59 | 95.10 | 99.09 |
| ViT-Small/16 | 90.22 | 90.83 | 90.02 | 89.91 | 96.70 |
| Loss Function | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| Cross-Entropy (CE) | 94.89 | 97.54 | 92.13 | 94.48 | 98.81 |
| Focal Loss | 94.02 | 94.77 | 93.60 | 94.04 | 98.57 |
| Weighted CE | 94.69 | 96.44 | 93.03 | 94.60 | 98.68 |
| Oversample | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| 0 (No) | 94.46 | 96.01 | 92.96 | 94.23 | 98.58 |
| 1 (Yes) | 94.61 | 96.49 | 92.88 | 94.52 | 98.79 |
| Sampling Ratio | Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|
| 1:1 | 94.21 | 95.87 | 92.61 | 94.11 | 98.47 |
| 2:1 | 94.86 | 96.63 | 93.23 | 94.64 | 98.90 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, Y.; Elbattah, M. Explainable Deep Learning for Endometriosis Classification in Laparoscopic Images. BioMedInformatics 2025, 5, 63. https://doi.org/10.3390/biomedinformatics5040063
Zhu Y, Elbattah M. Explainable Deep Learning for Endometriosis Classification in Laparoscopic Images. BioMedInformatics. 2025; 5(4):63. https://doi.org/10.3390/biomedinformatics5040063
Chicago/Turabian StyleZhu, Yixuan, and Mahmoud Elbattah. 2025. "Explainable Deep Learning for Endometriosis Classification in Laparoscopic Images" BioMedInformatics 5, no. 4: 63. https://doi.org/10.3390/biomedinformatics5040063
APA StyleZhu, Y., & Elbattah, M. (2025). Explainable Deep Learning for Endometriosis Classification in Laparoscopic Images. BioMedInformatics, 5(4), 63. https://doi.org/10.3390/biomedinformatics5040063
