Oral Cancer Diagnosis Using Histopathology Images: An Explainable Hybrid Transformer Framework
Abstract
1. Introduction
- A hybrid Swin–ViT framework enables effective feature extraction and accurate classification of OC histopathology images.
- Explainable AI (XAI) driven feature selection boosts interpretability and prediction while cutting dimensionality.
- A FastAPI-based real-time web application designed for scalable deployment in clinical and low-resource environments.
2. Related Works
3. Materials and Methods

| Algorithm 1: Proposed OC Detection Framework Algorithm |
| Input: Number of Epochs; Model Parameters; Learning Rate; Batch Size; OC Histopathology Dataset Output: The assessment metrics on the test dataset.
|
3.1. Data Acquisition
3.2. Data Preprocessing
3.3. Feature Extraction Using Swin Transformer
3.4. Proposed Model Architecture
3.5. Feature Selection Using SHAP
4. Results
4.1. Evaluating Metrics
4.2. Experimental Evaluation of Feature Extractors and Classifiers
4.3. Impact of Feature Selection Techniques
4.4. Hyperparameter Analysis of Swin Transformer
4.5. Ablation Study
4.6. Statistical Analysis of Model Performance
4.7. Web Application for Real-Time OC Classification
4.8. Comparison with Existing Methods
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| OC | Oral cancer |
| OSCC | Oral Squamous Cell Carcinoma |
| CAD | Computer-Aided Diagnostic |
| AI | Artificial Intelligence |
| ML | Machine Learning |
| DL | Deep Learning |
| CNN | Convolutional Neural Network |
| ViT | Vision Transformer |
| AttCNN | Attention based CNN |
| SHAP | SHapley Additive exPlanations |
| GAN | Generative Adversarial Network |
| IHC | Immunohistochemistry |
| WSI | Whole Slide Imaging |
| H&E | Hematoxylin and Eosin |
| ACC | Accuracy |
| AUC | Area Under the Curve |
| PRE | Precision |
| MCC | Matthews Correlation Coefficient |
| TP | True Positive |
| TN | True Negative |
| FP | False Positive |
| FN | False Negative |
| CLAHE | Contrast Limited Adaptive Histogram Equalization |
| ReLU | Rectified Linear Unit |
| SVM | Support Vector Machine |
References
- Liang, H.J.; Tan, X.Y.; Li, D.; Lin, C.; Huang, S.Y.; Nie, G.C.; Guo, X.F.; Zhang, Z.B.; Zhu, X.N.; Tan, S.K. New advances in oral microbiology and tumor research. World J. Clin. Oncol. 2025, 16, 106981. [Google Scholar] [CrossRef] [PubMed]
- Romero-Trejo, D.; Aguiñiga-Sanchez, I.; Ledesma-Martínez, E.; Weiss-Steider, B.; Sierra-Mondragón, E.; Santiago-Osorio, E. Anti-cancer potential of casein and its derivatives: Novel strategies for cancer treatment. Med. Oncol. 2024, 41, 200. [Google Scholar] [CrossRef] [PubMed]
- Wierzbicka, M.; Pietruszewska, W.; Maciejczyk, A.; Markowski, J. Trends in incidence and mortality of head and neck Cancer subsites among elderly patients: A Population-Based analysis. Cancers 2025, 17, 548. [Google Scholar] [CrossRef] [PubMed]
- Kijowska, J.; Grzegorczyk, J.; Gliwa, K.; Jędras, A.; Sitarz, M. Epidemiology, Diagnostics, and Therapy of Oral Cancer—Update Review. Cancers 2024, 16, 3156. [Google Scholar] [CrossRef]
- Rusinovci, S.; Aliu, X.; Jukić, T.; Štubljar, D.; Haliti, N. Analysis of THREE-year prevalence of oral cavity, neck and head tumors—A retrospective single-centre study. Acta Clin. Croat. 2020, 59, 445. [Google Scholar] [CrossRef]
- Wu, J.; Chen, H.; Liu, Y.; Yang, R.; An, N. The global, regional, and national burden of oral cancer, 1990–2021: A systematic analysis for the Global Burden of Disease Study 2021. J. Cancer Res. Clin. Oncol. 2025, 151, 53. [Google Scholar] [CrossRef]
- Kademani, D. Oral cancer. Mayo Clin. Proc. 2007, 82, 878–887. [Google Scholar] [CrossRef]
- Jose, J.; Wieczorek, A. Head and Neck Cancer. In Treatment of Cancer; CRC Press: Boca Raton, FL, USA, 2025; pp. 35–50. [Google Scholar]
- Maggiore, R.; Zumsteg, Z.S.; BrintzenhofeSzoc, K.; Trevino, K.M.; Gajra, A.; Korc-Grodzicki, B.; Epstein, J.B.; Bond, S.M.; Parker, I.; Kish, J.A.; et al. The older adult with locoregionally advanced head and neck squamous cell carcinoma: Knowledge gaps and future direction in assessment and treatment. Int. J. Radiat. Oncol. Biol. Phys. 2017, 98, 868–883. [Google Scholar] [CrossRef]
- Peng, H.; Wang, X.; Liao, Y.; Lan, L.; Wang, D.; Xiong, Y.; Xu, L.; Liang, Y.; Luo, X.; Xu, Y.; et al. Long-term exposure to ambient NO2 increase oral cancer prevalence in Southern China: A 3-year time-series analysis. Front. Public Health 2025, 13, 1484223. [Google Scholar] [CrossRef]
- Bushi, G.; Khatib, M.N.; Singh, M.P.; Pattanayak, M.; Vishwakarma, T.; Ballal, S.; Bansal, P.; Gaidhane, A.M.; Tomar, B.S.; Ashraf, A.; et al. Prevalence of suicidal ideation, attempts and associated risk factors in oral cancer patients: A systematic review and meta-analysis. BMC Oral Health 2025, 25, 140. [Google Scholar] [CrossRef]
- Hsu, P.C.; Huang, J.H.; Tsai, C.C.; Lin, Y.H.; Kuo, C.Y. Early Molecular Diagnosis and Comprehensive Treatment of Oral Cancer. Curr. Issues Mol. Biol. 2025, 47, 452. [Google Scholar] [CrossRef] [PubMed]
- Gupta, N.; Gupta, R.; Acharya, A.K.; Patthi, B.; Goud, V.; Reddy, S.; Garg, A.; Singla, A. Changing Trends in oral cancer-a global scenario. Nepal J. Epidemiol. 2016, 6, 613. [Google Scholar] [CrossRef] [PubMed]
- Rageh, O.A.; Mahmood, K.; Alkladi, E.; Bamuneef, A.; Algebaree, M.; Murad, A.; Munasser, M. Dental Student Knowledge of the Role of Early Detection of Oral Cancer: Multi Center Cross Sectional Study. Yemeni J. Med Sci. 2025, 19, 7. [Google Scholar] [CrossRef]
- Mohammed, R.A.; Ahmed, S.K. Oral cancer screening: Past, present, and future perspectives. Oral Oncol. Rep. 2024, 10, 100306. [Google Scholar] [CrossRef]
- Cirillo, N. Precursor lesions, overdiagnosis, and oral cancer: A critical review. Cancers 2024, 16, 1550. [Google Scholar] [CrossRef]
- Ng, J.Y.; Cramer, H.; Lee, M.S.; Moher, D. Traditional, complementary, and integrative medicine and artificial intelligence: Novel opportunities in healthcare. Integr. Med. Res. 2024, 13, 101024. [Google Scholar] [CrossRef]
- Shah, P.; Kendall, F.; Khozin, S.; Goosen, R.; Hu, J.; Laramie, J.; Ringel, M.; Schork, N. Artificial intelligence and machine learning in clinical development: A translational perspective. npj Digit. Med. 2019, 2, 69. [Google Scholar] [CrossRef]
- Al, M.M.F.; Hasib, F.M.; Young, L.; Na, G.; Wang, D. Diabetes Prediction and Detection System Through a Recurrent Neural Network in a Sensor Device. Electronics 2025, 14, 4207. [Google Scholar] [CrossRef]
- Briganti, G.; Le Moine, O. Artificial intelligence in medicine: Today and tomorrow. Front. Med. 2020, 7, 509744. [Google Scholar] [CrossRef]
- Kumar, Y.; Shrivastav, S.; Garg, K.; Modi, N.; Wiltos, K.; Woźniak, M.; Ijaz, M.F. Automating cancer diagnosis using advanced deep learning techniques for multi-cancer image classification. Sci. Rep. 2024, 14, 25006. [Google Scholar] [CrossRef]
- Unger, M.; Kather, J.N. Deep learning in cancer genomics and histopathology. Genome Med. 2024, 16, 44. [Google Scholar] [CrossRef] [PubMed]
- Vinay, V.; Jodalli, P.; Chavan, M.S.; Buddhikot, C.S.; Luke, A.M.; Ingafou, M.S.H.; Reda, R.; Pawar, A.M.; Testarelli, L. Artificial intelligence in oral cancer: A comprehensive scoping review of diagnostic and prognostic applications. Diagnostics 2025, 15, 280. [Google Scholar] [CrossRef] [PubMed]
- Khosravi, P.; Fuchs, T.J.; Ho, D.J. Artificial Intelligence–Driven Cancer Diagnostics: Enhancing Radiology and Pathology through Reproducibility, Explainability, and Multimodality. Cancer Res. 2025, 85, 2356–2367. [Google Scholar] [CrossRef] [PubMed]
- Pereira-Prado, V.; Martins-Silveira, F.; Sicco, E.; Hochmann, J.; Isiordia-Espinoza, M.A.; González, R.G.; Pandiar, D.; Bologna-Molina, R. Artificial intelligence for image analysis in oral squamous cell carcinoma: A review. Diagnostics 2023, 13, 2416. [Google Scholar] [CrossRef]
- Gupta, A.; Neelapu, B.C.; Rana, S.S. Computer-Aided Diagnosis (CAD) Tools and Applications for 3D Medical Imaging; Elsevier: Amsterdam, The Netherlands, 2025; Volume 136. [Google Scholar]
- Ma, Y.; Jamdade, S.; Konduri, L.; Sailem, H. AI in Histopathology Explorer for comprehensive analysis of the evolving AI landscape in histopathology. npj Digit. Med. 2025, 8, 156. [Google Scholar] [CrossRef]
- Ahmad, M.Y.; Mohamed, A.; Yusof, Y.A.M.; Ali, S.A.M. Colorectal cancer image classification using image pre-processing and multilayer Perceptron. In Proceedings of the 2012 International Conference on Computer & Information Science (ICCIS), Chongqing, China, 17–19 August 2012; IEEE: Piscataway, NJ, USA, 2012; Volume 1, pp. 275–280. [Google Scholar]
- Mira, E.S.; Saaduddin Sapri, A.M.; Aljehanı, R.F.; Jambı, B.S.; Bashir, T.; El-Kenawy, E.S.M.; Saber, M. Early diagnosis of oral cancer using image processing and Artificial intelligence. Fusion Pract. Appl. 2024, 14, 293–308. [Google Scholar] [CrossRef]
- Hossain, M.M.; Miah, M.B.A.; Saedi, M.; Sifat, T.A.; Hossain, M.N.; Hussain, N. An IoT-Based Lung Cancer Detection System from CT Images Using Deep Learning. In Proceedings of the International Conference on Emerging Trends in Cybersecurity (ICETCS 2025), Wolverhampton, UK, 27–28 October 2025; Lecture Notes in Electrical Engineering. Springer: Berlin/Heidelberg, Germany, 2025. [Google Scholar]
- Rahman, M.A.; Miah, M.B.A.; Hossain, M.A.; Hosen, A.S. Enhanced Brain Tumor Classification Using MobileNetV2: A Comprehensive Preprocessing and Fine-Tuning Approach. BioMedInformatics 2025, 5, 30. [Google Scholar] [CrossRef]
- Ulaganathan, G.; Niazi, K.T.M.; Srinivasan, S.; Balaji, V.; Manikandan, D.; Hameed, K.S.; Banumathi, A. A clinicopathological study of various oral cancer diagnostic techniques. J. Pharm. Bioallied Sci. 2017, 9, S4. [Google Scholar] [CrossRef]
- Li, L.; Pu, C.; Tao, J.; Zhu, L.; Hu, S.; Qiao, B.; Xing, L.; Wei, B.; Shi, C.; Chen, P.; et al. Development of an oral cancer detection system through deep learning. BMC Oral Health 2024, 24, 1468. [Google Scholar] [CrossRef]
- Nanditha, B.; MP, G. Oral cancer detection using machine learning and deep learning techniques. Int. J. Curr. Res. Rev. 2022, 14, 64–70. [Google Scholar] [CrossRef]
- Song, B.; Sunny, S.; Uthoff, R.D.; Patrick, S.; Suresh, A.; Kolur, T.; Keerthi, G.; Anbarani, A.; Wilder-Smith, P.; Kuriakose, M.A.; et al. Automatic classification of dual-modalilty, smartphone-based oral dysplasia and malignancy images using deep learning. Biomed. Opt. Express 2018, 9, 5318–5329. [Google Scholar] [CrossRef] [PubMed]
- Panigrahi, S.; Nanda, B.S.; Swarnkar, T. Comparative analysis of machine learning algorithms for histopathological images of oral cancer. In Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 318–327. [Google Scholar]
- Senthil Pandi, S.; Sutha, J.; Kumaragurubaran, T.; Kumar, P. Enhanced Classification of Oral Cancer Using Deep Learning Techniques. In Proceedings of the 2024 Second International Conference on Advances in Information Technology (ICAIT), Chikkamagaluru, India, 24–27 July 2024; IEEE: Piscataway, NJ, USA, 2024; Volume 1, pp. 1–5. [Google Scholar]
- Panahi, O.; Farrokh, S. The Use of Machine Learning for Personalized Dental-Medicine Treatment. Glob. J. Med. Biomed. Case Rep. 2025, 1, 2. [Google Scholar]
- Jeyaraj, P.R.; Samuel Nadar, E.R. Computer-assisted medical image classification for early diagnosis of oral cancer employing deep learning algorithm. J. Cancer Res. Clin. Oncol. 2019, 145, 829–837. [Google Scholar] [CrossRef] [PubMed]
- Halder, A.; Laha, S.; Bandyopadhyay, S.; Schwenker, F.; Sarkar, R. A Metaheuristic Optimization Based Deep Feature Selection for Oral Cancer Classification. In Proceedings of the IAPR Workshop on Artificial Neural Networks in Pattern Recognition, Montreal, BC, Canada, 10–12 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 132–143. [Google Scholar]
- Akhi, A.B.; Al Noman, A.; Shaha, S.P.; Akter, F.; Lata, M.A.; Sheikh, R. OCNet-23: A fine-tuned transfer learning approach for oral cancer detection from histopathological images. Int. J. Electr. Comput. Eng. (IJECE) 2025, 15, 1826–1833. [Google Scholar] [CrossRef]
- Rahman, T.Y. A histopathological image repository of normal epithelium of Oral Cavity and Oral Squamous Cell Carcinoma. Mendeley Data 2019. [Google Scholar] [CrossRef]
- Bury, T. For Image Generation Process. In Proceedings of the Information and Software Technologies: 30th International Conference, ICIST 2024, Kaunas, Lithuania, 17–18 October 2024; Springer Nature: Berlin/Heidelberg, Germany, 2025; Volume 2401, p. 61. [Google Scholar]
- Halloum, K.; Ez-Zahraouy, H. Enhancing Medical Image Classification through Transfer Learning and CLAHE Optimization. Curr. Med Imaging 2025, 21, e15734056342623. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, Y.; Cai, G. A novel 3D bilateral filtering algorithm with noise level estimation assisted by multi-temporal SAR. PLoS ONE 2025, 20, e0315395. [Google Scholar] [CrossRef]
- Asnake, N.W.; Ayalew, A.M.; Engda, A.A. Detection of oral squamous cell carcinoma cancer using AlexNet on histopathological images. Discov. Appl. Sci. 2025, 7, 155. [Google Scholar] [CrossRef]
- Pérez-Enriquez, L.; Jiménez-Domínguez, M.; García-Rojas, N.; Zapotecas-Martínez, S.; Altamirano-Robles, L. Image Contrast Enhancement: The Synergistic Power of a Dual-Gamma Correction Function and Evolutionary Algorithms. Comput. Y Sist. 2025, 29, 91–101. [Google Scholar] [CrossRef]
- Lin, S.; Zhou, H.; Watson, M.; Govindan, R.; Cote, R.J.; Yang, C. Impact of stain variation and color normalization for prognostic predictions in pathology. Sci. Rep. 2025, 15, 2369. [Google Scholar] [CrossRef]
- Du, Z.; Zhang, P.; Huang, X.; Hu, Z.; Yang, G.; Xi, M.; Liu, D. Deeply supervised two stage generative adversarial network for stain normalization. Sci. Rep. 2025, 15, 7068. [Google Scholar] [CrossRef] [PubMed]
- Heilmann, T.A. Sharp Images and Unsharp Masks. Transbordeur. Photogr. Hist. Soc. 2025, 9. [Google Scholar] [CrossRef]
- Verma, K.; Srivastava, S.; Mishra, R.K. Optimized Reformed Anisotropic Diffusion Unsharp Masking Filter for MR Images. Trait. Signal 2025, 42, 2181–2194. [Google Scholar] [CrossRef]
- Adeoye, J.; Koohi-Moghadam, M.; Choi, S.W.; Zheng, L.W.; Lo, A.W.I.; Tsang, R.K.Y.; Chow, V.L.Y.; Akinshipo, A.; Thomson, P.; Su, Y.X. Predicting oral cancer risk in patients with oral leukoplakia and oral lichenoid mucositis using machine learning. J. Big Data 2023, 10, 39. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Hilal, B.K.; AlShemmary, E.N. Detecting Hypertrophic Cardiomyopathy: A Deep Learning Approach with CNNs and Swin Transformers. Int. J. Intell. Eng. Syst. 2025, 18, 44–65. [Google Scholar] [CrossRef]
- Emegano, D.I.; Mustapha, M.T.; Ozsahin, I.; Ozsahin, D.U.; Uzun, B. Advancing Prostate Cancer Diagnostics: A ConvNeXt Approach to Multi-Class Classification in Underrepresented Populations. Bioengineering 2025, 12, 369. [Google Scholar] [CrossRef]
- Kumar, A.; Yadav, S.P.; Kumar, A. An improved feature extraction algorithm for robust Swin Transformer model in high-dimensional medical image analysis. Comput. Biol. Med. 2025, 188, 109822. [Google Scholar] [CrossRef]
- Velu, K.; Jaisankar, N. Design of a CNN–Swin transformer model for Alzheimer’s disease prediction using MRI images. IEEE Access 2025, 13, 149409–149429. [Google Scholar] [CrossRef]
- Zhang, L.; Yin, X.; Liu, X.; Liu, Z. Medical image segmentation by combining feature enhancement Swin Transformer and UperNet. Sci. Rep. 2025, 15, 14565. [Google Scholar] [CrossRef]
- Ansith, S.; Ananth, A.; Deni, R.E.; Kala, S. Swin-RSIC: Remote sensing image classification using a modified swin transformer with explainability. Earth Sci. Inform. 2025, 18, 362. [Google Scholar]
- Guo, Y.; Li, W.; Zhai, P. Swin-transformer for weak feature matching. Sci. Rep. 2025, 15, 2961. [Google Scholar] [CrossRef] [PubMed]
- Mzoughi, H.; Njeh, I.; BenSlima, M.; Farhat, N.; Mhiri, C. Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI). Vis. Comput. 2025, 41, 2123–2142. [Google Scholar] [CrossRef]
- Jahan, I.; Chowdhury, M.E.; Vranic, S.; Al Saady, R.M.; Kabir, S.; Pranto, Z.H.; Mim, S.J.; Nobi, S.F. Deep learning and vision transformers-based framework for breast cancer and subtype identification. Neural Comput. Appl. 2025, 37, 9311–9330. [Google Scholar] [CrossRef]
- Mannepalli, D.; Tak, T.K.; Krishnan, S.B.; Sreenivas, V. GSC-DVIT: A vision transformer based deep learning model for lung cancer classification in CT images. Biomed. Signal Process. Control 2025, 103, 107371. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Hu, J.; Xiang, Y.; Lin, Y.; Du, J.; Zhang, H.; Liu, H. Multi-scale Transformer architecture for accurate medical image classification. In Proceedings of the 2025 International Conference on Artificial Intelligence and Computational Intelligence, Kuala Lumpur, Malaysia, 14–16 February 2025; pp. 409–414. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- Hancock, J.T.; Khoshgoftaar, T.M.; Liang, Q. A problem-agnostic approach to feature selection and analysis using shap. J. Big Data 2025, 12, 12. [Google Scholar] [CrossRef]
- Miah, M.B.A.; Awang, S.; Azad, M.S.; Rahman, M.M. Keyphrases concentrated area identification from academic articles as feature of keyphrase extraction: A new unsupervised approach. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 789–796. [Google Scholar] [CrossRef]
- Hausken, K.; Mohr, M. The value of a player in n-person games. Soc. Choice Welf. 2001, 18, 465–483. [Google Scholar] [CrossRef]
- Noor, S.; AlQahtani, S.A.; Khan, S. Chronic liver disease detection using ranking and projection-based feature optimization with deep learning. AIMS Bioeng. 2025, 12, 50–68. [Google Scholar] [CrossRef]
- Ji, Y.; Shang, H.; Yi, J.; Zang, W.; Cao, W. Machine learning-based models to predict type 2 diabetes combined with coronary heart disease and feature analysis-based on interpretable SHAP. Acta Diabetol. 2025, 62, 1631–1646. [Google Scholar] [CrossRef]
- Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
- Sahoo, P.; Saha, S.; Sharma, S.K.; Mondal, S. Boosting cervical cancer detection with a multi-stage architecture and complementary information fusion. Soft Comput. 2025, 29, 1191–1206. [Google Scholar] [CrossRef]
- Hosen, M.F.; Mahmud, S.H.; Goh, K.O.M.; Uddin, M.S.; Nandi, D.; Shatabda, S.; Shoombuatong, W. An LSTM network-based model with attention techniques for predicting linear T-cell epitopes of the hepatitis C virus. Results Eng. 2024, 24, 103476. [Google Scholar] [CrossRef]
- Wu, Z.; Zhang, H.; Fang, C. Research on machine vision online monitoring system for egg production and quality in cage environment. Poult. Sci. 2025, 104, 104552. [Google Scholar] [CrossRef]
- Li, Y.; Gao, F.; Yu, J.; Fei, T. Machine learning based thermal comfort prediction in office spaces: Integrating SMOTE and SHAP methods. Energy Build. 2025, 329, 115267. [Google Scholar] [CrossRef]
- Miah, M.B.A.; Awang, S.; Rahman, M.M.; Hosen, A.S.; Ra, I.H. Keyphrases frequency analysis from research articles: A region-based unsupervised novel approach. IEEE Access 2022, 10, 120838–120849. [Google Scholar] [CrossRef]
- Miah, M.B.A.; Awang, S.; Rahman, M.M.; Hosen, A.S.; Ra, I.H. A new unsupervised technique to analyze the centroid and frequency of keyphrases from academic articles. Electronics 2022, 11, 2773. [Google Scholar] [CrossRef]
- Miah, M.B.A.; Awang, S.; Rahman, M.M.; Hosen, A.S. Keyphrase Distance Analysis Technique from News Articles as a Feature for Keyphrase Extraction: An Unsupervised Approach. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 995–1002. [Google Scholar] [CrossRef]
- Hossain, M.N.; Bhuiyan, E.; Miah, M.B.A.; Sifat, T.A.; Muhammad, Z.; Masud, M.F.A. Detection and Classification of Kidney Disease from CT Images: An Automated Deep Learning Approach. Technologies 2025, 13, 508. [Google Scholar] [CrossRef]
- Shavlokhova, V.; Sandhu, S.; Flechtenmacher, C.; Koveshazi, I.; Neumeier, F.; Padrón-Laso, V.; Jonke, Ž.; Saravi, B.; Vollmer, M.; Vollmer, A.; et al. Deep learning on oral squamous cell carcinoma ex vivo fluorescent confocal microscopy data: A feasibility study. J. Clin. Med. 2021, 10, 5326. [Google Scholar] [CrossRef]
- Dai, Z.; Zhu, B.; Yu, H.; Jian, X.; Peng, J.; Fang, C.; Wu, Y. Role of autophagy induced by arecoline in angiogenesis of oral submucous fibrosis. Arch. Oral Biol. 2019, 102, 7–15. [Google Scholar] [CrossRef] [PubMed]
- Yu, M.; Ding, J.; Liu, W.; Tang, X.; Xia, J.; Liang, S.; Jing, R.; Zhu, L.; Zhang, T. Deep multi-feature fusion residual network for oral squamous cell carcinoma classification and its intelligent system using Raman spectroscopy. Biomed. Signal Process. Control 2023, 86, 105339. [Google Scholar] [CrossRef]
- Chang, X.; Yu, M.; Liu, R.; Jing, R.; Ding, J.; Xia, J.; Zhu, Z.; Li, X.; Yao, Q.; Zhu, L.; et al. Deep learning methods for oral cancer detection using Raman spectroscopy. Vib. Spectrosc. 2023, 126, 103522. [Google Scholar] [CrossRef]
- Panigrahi, S.; Nanda, B.S.; Bhuyan, R.; Kumar, K.; Ghosh, S.; Swarnkar, T. Classifying histopathological images of oral squamous cell carcinoma using deep transfer learning. Heliyon 2023, 9, e13444. [Google Scholar] [CrossRef]
- Sukegawa, S.; Ono, S.; Tanaka, F.; Inoue, Y.; Hara, T.; Yoshii, K.; Nakano, K.; Takabatake, K.; Kawai, H.; Katsumitsu, S.; et al. Effectiveness of deep learning classifiers in histopathological diagnosis of oral squamous cell carcinoma by pathologists. Sci. Rep. 2023, 13, 11676. [Google Scholar] [CrossRef]
- Das, M.; Dash, R.; Mishra, S.K. Automatic detection of oral squamous cell carcinoma from histopathological images of oral mucosa using deep convolutional neural network. Int. J. Environ. Res. Public Health 2023, 20, 2131. [Google Scholar] [CrossRef]
- Nagarajan, B.; Chakravarthy, S.; Venkatesan, V.K.; Ramakrishna, M.T.; Khan, S.B.; Basheer, S.; Albalawi, E. A deep learning framework with an intermediate layer using the swarm intelligence optimizer for diagnosing oral squamous cell carcinoma. Diagnostics 2023, 13, 3461. [Google Scholar] [CrossRef]
- Flügge, T.; Gaudin, R.; Sabatakakis, A.; Tröltzsch, D.; Heiland, M.; van Nistelrooij, N.; Vinayahalingam, S. Detection of oral squamous cell carcinoma in clinical photographs using a vision transformer. Sci. Rep. 2023, 13, 2296. [Google Scholar] [CrossRef]
- Albalawi, E.; Thakur, A.; Ramakrishna, M.T.; Bhatia Khan, S.; SankaraNarayanan, S.; Almarri, B.; Hadi, T.H. Oral squamous cell carcinoma detection using EfficientNet on histopathological images. Front. Med. 2024, 10, 1349336. [Google Scholar] [CrossRef]







| Parameter | Specification |
|---|---|
| Total Images | 528 |
| Normal/OSCC | 89/439 |
| Patients | 230 |
| Magnification | 100× (H&E stained) |
| Feature Extractor | Classifier | ACC | AUC | PRE | SP | SN | F1 | MCC |
|---|---|---|---|---|---|---|---|---|
| ConvNeXt | Random Forest | 0.7985 | 0.8724 | 0.7681 | 0.7324 | 0.7193 | 0.7523 | 0.6742 |
| CatBoost | 0.8124 | 0.8541 | 0.7891 | 0.7482 | 0.7410 | 0.7812 | 0.7311 | |
| Focal Loss SVM | 0.8562 | 0.9310 | 0.8478 | 0.7589 | 0.7661 | 0.7894 | 0.7833 | |
| Attention-CNN | 0.8841 | 0.9492 | 0.8847 | 0.8134 | 0.8210 | 0.8481 | 0.8294 | |
| ViT | 0.9126 | 0.9662 | 0.9234 | 0.8791 | 0.8864 | 0.9170 | 0.8921 | |
| BEiT | Random Forest | 0.7342 | 0.8834 | 0.5893 | 0.6844 | 0.6752 | 0.5093 | 0.7081 |
| CatBoost | 0.7654 | 0.8901 | 0.7041 | 0.7103 | 0.6989 | 0.6012 | 0.7394 | |
| Focal Loss SVM | 0.8073 | 0.8989 | 0.8092 | 0.7512 | 0.7394 | 0.6762 | 0.7620 | |
| Attention-CNN | 0.8421 | 0.9023 | 0.9174 | 0.8133 | 0.8021 | 0.7555 | 0.8024 | |
| ViT | 0.9012 | 0.9184 | 0.9321 | 0.8722 | 0.8641 | 0.8794 | 0.8613 | |
| Swin Transformer | Random Forest | 0.8031 | 0.8622 | 0.7724 | 0.6421 | 0.6554 | 0.6681 | 0.4043 |
| CatBoost | 0.7814 | 0.8981 | 0.7463 | 0.6024 | 0.6138 | 0.6132 | 0.3194 | |
| Focal Loss SVM | 0.8482 | 0.9203 | 0.8191 | 0.7911 | 0.7820 | 0.7663 | 0.6670 | |
| Attention-CNN | 0.9124 | 0.9489 | 0.8912 | 0.8851 | 0.8924 | 0.9281 | 0.8214 | |
| ViT | 0.9421 | 0.9738 | 0.9602 | 0.9741 | 0.9662 | 0.9623 | 0.9714 |
| Feature Selection | ACC | AUC | PRE | SP | SN | F1 | MCC |
|---|---|---|---|---|---|---|---|
| mRMR | 0.9625 | 0.9642 | 0.9633 | 0.9723 | 0.9712 | 0.9724 | 0.9735 |
| Lasso | 0.9612 | 0.9552 | 0.9562 | 0.9778 | 0.9643 | 0.9634 | 0.9655 |
| Boruta | 0.9595 | 0.9462 | 0.9491 | 0.9833 | 0.9574 | 0.9544 | 0.9575 |
| Genetic Algorithm | 0.9780 | 0.9772 | 0.9820 | 0.9888 | 0.9705 | 0.9654 | 0.9695 |
| SHAP | 0.9925 | 0.9822 | 0.9826 | 0.9918 | 0.9921 | 0.9843 | 0.9821 |
| Features | ACC | AUC | PRE | SP | SN | F1 | MCC |
|---|---|---|---|---|---|---|---|
| 200 | 0.8875 | 0.9200 | 0.9831 | 0.9286 | 0.8788 | 0.9274 | 0.6977 |
| 300 | 0.9375 | 0.9400 | 0.9841 | 0.9286 | 0.9394 | 0.9606 | 0.8063 |
| 500 | 0.9925 | 0.9822 | 0.9826 | 0.9918 | 0.9921 | 0.9843 | 0.9821 |
| 700 | 0.9625 | 0.9750 | 0.9701 | 0.8571 | 0.9848 | 0.9765 | 0.8673 |
| 900 | 0.9375 | 0.9600 | 0.9552 | 0.7857 | 0.9697 | 0.9620 | 0.7778 |
| Name | Parameter |
|---|---|
| Output Dimension | 1024 |
| Base Model Name | swin_large_patch4_window7_224 |
| Base Feature Dimension | 1536 |
| Attention Dimension | 1536 |
| First Linear Layer Input | 1536 |
| First Linear Layer Output | 1280 |
| Second Linear Layer Input | 1280 |
| Second Linear Layer Output | 1024 |
| First Activation Function | ReLU |
| Second Activation Function | ReLU |
| First Batch Normalization | BatchNorm1d(1280) |
| Second Batch Normalization | BatchNorm1d(1024) |
| Dropout Rate | 0.2 |
| Patch Size | 4 |
| Window Size | 7 |
| Pooling Method | Mean pooling |
| Name | Parameter |
|---|---|
| Input Dimension | 1024 |
| Embedding Dimension | 128 |
| Number of Attention Heads | 8 |
| Feedforward Dimension | 256 |
| Dropout Rate | 0.1 |
| Number of Transformer Layers | 4 |
| Learning Rate | 0.0001 |
| Training Epochs | 50 |
| Batch Size | 128 |
| Model | ACC | AUC | PRE | SP | SN | F1 | MCC |
|---|---|---|---|---|---|---|---|
| Full Model | 0.9925 | 0.9822 | 0.9826 | 0.9918 | 0.9921 | 0.9843 | 0.9821 |
| No Positional | 0.8000 | 0.8052 | 0.9310 | 0.7143 | 0.8182 | 0.8708 | 0.4527 |
| Shallow ViT | 0.7875 | 0.8106 | 0.9455 | 0.7857 | 0.7879 | 0.8592 | 0.4703 |
| Deep ViT | 0.9875 | 0.8604 | 0.9851 | 0.9286 | 1.0000 | 0.9925 | 0.9562 |
| Max Pooling | 0.9750 | 0.9264 | 0.9706 | 0.8571 | 1.0000 | 0.9851 | 0.9121 |
| Metric | t-Test (t-Stat) | t-Test (p-Value) | Wilcoxon (Stat) | Wilcoxon (p-Value) |
|---|---|---|---|---|
| ACC | −4.105 | 0.0051 | 2.5 | 0.0284 |
| AUC | −3.220 | 0.0119 | 4.5 | 0.0371 |
| PRE | −5.928 | 0.0006 | 1.0 | 0.0120 |
| SN | −4.667 | 0.0019 | 1.8 | 0.0213 |
| F1 | −5.872 | 0.0005 | 0.0 | 0.0065 |
| MCC | −3.401 | 0.0112 | 3.5 | 0.0369 |
| Model | t-Test (t-Stat) | t-Test (p-Value) | Wilcoxon (Stat) | Wilcoxon (p-Value) |
|---|---|---|---|---|
| Random Forest | −3.003 | 0.0183 | 3.0 | 0.0350 |
| CatBoost | −4.567 | 0.0025 | 1.5 | 0.0201 |
| Focal Loss SVM | −5.445 | 0.0009 | 1.0 | 0.0154 |
| Attention-CNN | −3.890 | 0.0059 | 2.8 | 0.0298 |
| ViT | −6.751 | 0.0002 | 0.0 | 0.0061 |
| Study | Technique | ACC |
|---|---|---|
| Shavlokhova et al. [81] | ResNet50 with feature fusion | 92.37% |
| Dai et al. [82] | ResNet50 with Raman Spectra | 92.48% |
| Yu et al. [83] | ResNet50 with DCNNs | 95.84% |
| Chang et al. [84] | ResNet50 with VGG16 | 85.63% |
| Panigrahi et al. [85] | Three Kinds of CNN | 92.14% |
| Sukegawa et al. [86] | Probability Neural Network | 79.46% |
| Das et al. [87] | Multiple techniques fusion | 97.41% |
| Nagarajan et al. [88] | MobileNetV3 with Gorilla Troops Optimizer | 94.18% |
| Flügge et al. [89] | Swin Transformer | 97.63% |
| Albalawi et al. [90] | EfficientNet B3 with Advanced Learning Mechanism | 98.47% |
| Proposed Model | Swin Transformer with Vision Transformer | 99.25% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cruze, F.R.D.; Wasima, J.; Hosen, M.F.; Miah, M.B.A.; Muhammad, Z.; Masud, M.F.A. Oral Cancer Diagnosis Using Histopathology Images: An Explainable Hybrid Transformer Framework. Technologies 2026, 14, 39. https://doi.org/10.3390/technologies14010039
Cruze FRD, Wasima J, Hosen MF, Miah MBA, Muhammad Z, Masud MFA. Oral Cancer Diagnosis Using Histopathology Images: An Explainable Hybrid Transformer Framework. Technologies. 2026; 14(1):39. https://doi.org/10.3390/technologies14010039
Chicago/Turabian StyleCruze, Francis Rudra D, Jeba Wasima, Md. Faruk Hosen, Mohammad Badrul Alam Miah, Zia Muhammad, and Md Fuyad Al Masud. 2026. "Oral Cancer Diagnosis Using Histopathology Images: An Explainable Hybrid Transformer Framework" Technologies 14, no. 1: 39. https://doi.org/10.3390/technologies14010039
APA StyleCruze, F. R. D., Wasima, J., Hosen, M. F., Miah, M. B. A., Muhammad, Z., & Masud, M. F. A. (2026). Oral Cancer Diagnosis Using Histopathology Images: An Explainable Hybrid Transformer Framework. Technologies, 14(1), 39. https://doi.org/10.3390/technologies14010039

