Abstract
Background/Objectives: Otitis media (OM), including acute otitis media (AOM) and chronic otitis media (COM), is a common middle ear disease that can lead to significant morbidity if not accurately diagnosed. Otoscopic interpretation remains subjective and operator-dependent, underscoring the need for objective and reproducible diagnostic support. Recent advances in artificial intelligence (AI) offer promising solutions for automated otoscopic image analysis. Methods: We developed an AI-based diagnostic framework consisting of three sequential steps: (1) semi-supervised learning for automatic recognition and semantic segmentation of tympanic membrane structures, (2) region-based feature extraction, and (3) disease classification. A total of 607 clinical otoscopic images were retrospectively collected, including normal ears (n = 220), AOM (n = 157), and COM with tympanic membrane perforation (n = 230). Among these, 485 images were used for training and 122 for independent testing. Semantic segmentation of five anatomically relevant regions was performed using multiple convolutional neural network architectures, including U-Net, PSPNet, HRNet, and DeepLabV3+. Following segmentation, color and texture features were extracted from each region and used to train a neural network-based classifier to differentiate disease states. Results: Among the evaluated segmentation models, U-Net demonstrated superior performance, achieving an overall pixel accuracy of 96.76% and a mean Dice similarity coefficient of 71.68%. The segmented regions enabled reliable extraction of discriminative chromatic and texture features. In the final classification stage, the proposed framework achieved diagnostic accuracies of 100% for normal ears, 100% for AOM, and 91.3% for COM on the independent test set, with an overall accuracy of 96.72%. Conclusions: This study demonstrates that a semi-supervised, segmentation-driven AI pipeline integrating feature extraction and classification can achieve high diagnostic accuracy for otitis media. The proposed framework offers a clinically interpretable and fully automated approach that may enhance diagnostic consistency, support clinical decision-making, and facilitate scalable otoscopic assessment in diverse healthcare screening settings for disease prevention and health education.