Skip Content
You are currently on the new version of our website. Access the old version .
DiagnosticsDiagnostics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

2 February 2026

Deep Learning-Based Semantic Segmentation and Classification of Otoscopic Images for Otitis Media Diagnosis and Health Promotion

,
,
,
,
,
and
1
Division of General Surgery, Department of Surgery, Tri-Service General Hospital Songshan Branch, National Defense Medical University, Taipei 105309, Taiwan
2
Department of Health Promotion and Health Education, National Taiwan Normal University, Taipei 106308, Taiwan
3
Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical University, No. 325, Sec. 2, Cheng-Gong Road, Neihu District, Taipei 114202, Taiwan
4
Department of Otolaryngology-Head and Neck Surgery, School of Medicine, College of Medicine, National Defense Medical University, Taipei 114202, Taiwan
This article belongs to the Special Issue AI-Assisted Diagnostics in Telemedicine and Digital Health

Abstract

Background/Objectives: Otitis media (OM), including acute otitis media (AOM) and chronic otitis media (COM), is a common middle ear disease that can lead to significant morbidity if not accurately diagnosed. Otoscopic interpretation remains subjective and operator-dependent, underscoring the need for objective and reproducible diagnostic support. Recent advances in artificial intelligence (AI) offer promising solutions for automated otoscopic image analysis. Methods: We developed an AI-based diagnostic framework consisting of three sequential steps: (1) semi-supervised learning for automatic recognition and semantic segmentation of tympanic membrane structures, (2) region-based feature extraction, and (3) disease classification. A total of 607 clinical otoscopic images were retrospectively collected, including normal ears (n = 220), AOM (n = 157), and COM with tympanic membrane perforation (n = 230). Among these, 485 images were used for training and 122 for independent testing. Semantic segmentation of five anatomically relevant regions was performed using multiple convolutional neural network architectures, including U-Net, PSPNet, HRNet, and DeepLabV3+. Following segmentation, color and texture features were extracted from each region and used to train a neural network-based classifier to differentiate disease states. Results: Among the evaluated segmentation models, U-Net demonstrated superior performance, achieving an overall pixel accuracy of 96.76% and a mean Dice similarity coefficient of 71.68%. The segmented regions enabled reliable extraction of discriminative chromatic and texture features. In the final classification stage, the proposed framework achieved diagnostic accuracies of 100% for normal ears, 100% for AOM, and 91.3% for COM on the independent test set, with an overall accuracy of 96.72%. Conclusions: This study demonstrates that a semi-supervised, segmentation-driven AI pipeline integrating feature extraction and classification can achieve high diagnostic accuracy for otitis media. The proposed framework offers a clinically interpretable and fully automated approach that may enhance diagnostic consistency, support clinical decision-making, and facilitate scalable otoscopic assessment in diverse healthcare screening settings for disease prevention and health education.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.