Next Article in Journal
Systems-Level Support for Hybrid Quantum-Classical Learning: A Systematic Review with a Medical Imaging Translation Lens
Previous Article in Journal
Artificial Intelligence Dystocia Algorithm (AIDA) for Risk Stratification of Occiput Posterior Fetal Head Position
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images

1
Dipartimento di Ingegneria Informatica, Automatica e Gestionale (DIAG), Sapienza Università di Roma, Via Ariosto, 25, 00185 Rome, Italy
2
Istituto di Scienze Applicate e Sistemi Intelligenti (ISASI), Consiglio Nazionale delle Ricerche (CNR), Via Monteroni s.n, 73100 Lecce, Italy
3
Dermatologia Myskin, Poliambulatorio Specialistico Medico-Chirurgico, Via S. Marco, 21, 73030 Tiggiano, Italy
4
AST Pesaro-Urbino, Via Borsellino 4, 60019 Fano, Italy
5
La Rocca Skin Medical Center, Via Marchetti, 110, 61122 Senigallia, Italy
*
Author to whom correspondence should be addressed.
J. Imaging 2026, 12(6), 231; https://doi.org/10.3390/jimaging12060231
Submission received: 31 March 2026 / Revised: 21 May 2026 / Accepted: 21 May 2026 / Published: 27 May 2026

Abstract

The integration of artificial intelligence (AI), particularly convolutional neural networks (CNNs), into dermatological diagnosis demonstrates substantial clinical potential. While the existing literature predominantly benchmarks algorithmic performance against human experts, our study adopts a novel perspective by investigating the intrinsic complexity of dermatoscopic images. Through rigorous experimentation with multiple CNN architectures, we isolated a subset of images systematically misclassified across all models—a phenomenon statistically proven to exceed random chance. To determine whether these failures stem from algorithmic biases or inherent visual ambiguity, expert dermatologists independently evaluated these challenging cases alongside a control group. The results revealed a collapse in human diagnostic performance on the AI-misclassified images. First, agreement with ground-truth labels plummeted, with Cohen’s kappa dropping to a mere 0.08 for this subset, compared to 0.61 for the control group. Second, we observed a severe deterioration in expert consensus; inter-rater reliability among physicians fell from moderate concordance (Fleiss’ kappa = 0.456) on control images to only modest agreement (Fleiss’ kappa = 0.275) on the misclassified subset. We identified image quality as a primary driver of these dual systematic failures. To promote transparency and reproducibility, all data, code, and trained models have been made publicly available.
Keywords: artificial intelligence (AI); machine learning (ML); convolutional neural networks (CNNs); deep learning; medical image analysis; dermatology; image quality; statistical analysis artificial intelligence (AI); machine learning (ML); convolutional neural networks (CNNs); deep learning; medical image analysis; dermatology; image quality; statistical analysis

Share and Cite

MDPI and ACS Style

Cino, L.; Mazzeo, P.L.; Martella, A.; Radi, G.; Rossi, R.; Distante, C. When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images. J. Imaging 2026, 12, 231. https://doi.org/10.3390/jimaging12060231

AMA Style

Cino L, Mazzeo PL, Martella A, Radi G, Rossi R, Distante C. When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images. Journal of Imaging. 2026; 12(6):231. https://doi.org/10.3390/jimaging12060231

Chicago/Turabian Style

Cino, Loris, Pier Luigi Mazzeo, Alessandro Martella, Giulia Radi, Renato Rossi, and Cosimo Distante. 2026. "When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images" Journal of Imaging 12, no. 6: 231. https://doi.org/10.3390/jimaging12060231

APA Style

Cino, L., Mazzeo, P. L., Martella, A., Radi, G., Rossi, R., & Distante, C. (2026). When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images. Journal of Imaging, 12(6), 231. https://doi.org/10.3390/jimaging12060231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop