Automated Early Detection of Skin Cancer Using a CNN-ViT-Attention-Based Hybrid Model
Abstract
1. Introduction
1.1. Contribution and Novelty
1.2. Releated Works
1.3. Organization of Paper
2. Materials and Methods
2.1. Dataset
2.2. Class Weight
2.3. Overall Proposed Model
3. Results
3.1. Results of Pre-Trained Models
3.2. Results of Proposed Model
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Roky, A.H.; Islam, M.M.; Ahasan, A.M.F.; Mostaq, M.S.; Mahmud, M.Z.; Amin, M.N.; Mahmud, M.A. Overview of skin cancer types and prevalence rates across continents. Cancer Pathog. Ther. 2025, 3, 89–100. [Google Scholar] [CrossRef]
- Gloster, H.M., Jr.; Neal, K. Skin cancer in skin of color. J. Am. Acad. Dermatol. 2006, 55, 741–760. [Google Scholar] [CrossRef]
- Thomas, R.F.; Scotto, J. Estimating increases in skin cancer morbidity due to increases in ultraviolet radiation exposure. Cancer Investig. 1983, 1, 119–126. [Google Scholar] [CrossRef] [PubMed]
- Craythorne, E.; Al-Niami, F. Skin cancer. Medicine 2017, 45, 431–434. [Google Scholar] [CrossRef]
- Jadhav, L.A.; Mandlik, S.K. Nanocarriers in skin cancer treatment: Emerging drug delivery approaches and innovations. Nano TransMed 2025, 4, 100068. [Google Scholar] [CrossRef]
- Shah, A.; Shah, M.; Pandya, A.; Sushra, R.; Sushra, R.; Mehta, M.; Patel, K.; Patel, K. A comprehensive study on skin cancer detection using artificial neural network (ANN) and convolutional neural network (CNN). Clin. eHealth 2023, 6, 76–84. [Google Scholar] [CrossRef]
- Nguyen, T.; Nguyen, G.; Nguyen, B.M. EO-CNN: An enhanced CNN model trained by equilibrium optimization for traffic transportation prediction. Procedia Comput. Sci. 2020, 176, 800–809. [Google Scholar] [CrossRef]
- Ogut, Z.; Karaduman, M.; Bozdag, P.G.; Karakose, M.; Yildirim, M. A Hybrid Model with Quantum Feature Map Based on CNN and Vision Transformer for Clinical Support in Diagnosis of Acute Appendicitis. Biomedicines 2026, 14, 183. [Google Scholar] [CrossRef]
- Tavakoli, M.J.; Fazl, F.; Sedighi, M.; Naseri, K.; Ghavami, M.; Taghipour-Gorjikolaie, M. Enhancing Pharmacy Warehouse Management with Faster R-CNN for Accurate and Reliable Pharmaceutical Product Identification and Counting. Int. J. Intell. Syst. 2025, 2025, 8883735. [Google Scholar] [CrossRef]
- Liu, X.; Huang, D.; Jing, T.; Zhang, Y. Detection of AC arc faults of aviation cables based on HIW three-dimensional features and CNN-LSTM neural network. IEEE Access 2022, 10, 106958–106971. [Google Scholar] [CrossRef]
- Bayram, H.Y.; Bingol, H.; Alatas, B. Hybrid deep model for automated detection of tomato leaf diseases. Trait. Du Signal 2022, 39, 1781. [Google Scholar] [CrossRef]
- Garg, R.; Maheshwari, S.; Shukla, A. Decision support system for detection and classification of skin cancer using CNN. In Innovations in Computational Intelligence and Computer Vision: Proceedings of ICICV 2020; Springer: Singapore, 2020; pp. 578–586. [Google Scholar]
- Bugday, M.S.; Akcicek, M.; Bingol, H.; Yildirim, M. Automatic diagnosis of ureteral stone and degree of hydronephrosis with proposed convolutional neural network, RelieF, and gradient-weighted class activation mapping based deep hybrid model. Int. J. Imaging Syst. Technol. 2023, 33, 760–769. [Google Scholar] [CrossRef]
- Dildar, M.; Akram, S.; Irfan, M.; Khan, H.U.; Ramzan, M.; Mahmood, A.R.; Alsaiari, S.A.; Saeed, A.H.M.; Alraddadi, M.O.; Mahnashi, M.H. Skin cancer detection: A review using deep learning techniques. Int. J. Environ. Res. Public Health 2021, 18, 5479. [Google Scholar] [CrossRef] [PubMed]
- Hosny, K.M.; Kassem, M.A.; Foaud, M.M. Skin cancer classification using deep learning and transfer learning. In Proceedings of the 2018 9th Cairo International Biomedical Engineering Conference (CIBEC), Cairo, Egypt, 20–22 December 2018; IEEE: New York, NY, USA, 2018; pp. 90–93. [Google Scholar]
- Gouda, W.; Sama, N.U.; Al-Waakid, G.; Humayun, M.; Jhanjhi, N.Z. Detection of skin cancer based on skin lesion images using deep learning. Healthcare 2022, 10, 1183. [Google Scholar] [CrossRef] [PubMed]
- Gururaj, H.L.; Manju, N.; Nagarjun, A.; Aradhya, V.M.; Flammini, F. DeepSkin: A deep learning approach for skin cancer classification. IEEE Access 2023, 11, 50205–50214. [Google Scholar] [CrossRef]
- Jinnai, S.; Yamazaki, N.; Hirano, Y.; Sugawara, Y.; Ohe, Y.; Hamamoto, R. The development of a skin cancer classification system for pigmented skin lesions using deep learning. Biomolecules 2020, 10, 1123. [Google Scholar] [CrossRef] [PubMed]
- Kousis, I.; Perikos, I.; Hatzilygeroudis, I.; Virvou, M. Deep learning methods for accurate skin cancer recognition and mobile application. Electronics 2022, 11, 1294. [Google Scholar] [CrossRef]
- Tembhurne, J.V.; Hebbar, N.; Patil, H.Y.; Diwan, T. Skin cancer detection using ensemble of machine learning and deep learning techniques. Multimed. Tools Appl. 2023, 82, 27501–27524. [Google Scholar] [CrossRef]
- Wang, X.; Yang, Y.; Mandal, B. Automatic detection of skin cancer melanoma using transfer learning in deep network. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2023; Volume 2562, p. 020009. [Google Scholar]
- Balambigai, S.; Elavarasi, K.; Abarna, M.; Abinaya, R.; Vignesh, N.A. Detection and optimization of skin cancer using deep learning. J. Phys. Conf. Ser. 2022, 2318, 012040. [Google Scholar] [CrossRef]
- Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 Dataset: A Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef]
- Altalhan, M.; Algarni, A.; Turki-Hadj Alouane, M. Imbalanced Data Problem in Machine Learning: A Review. IEEE Access 2025, 13, 13686–13699. [Google Scholar] [CrossRef]
- Pattilachan, T.M.; Demir, U.; Keles, E.; Jha, D.; Klatte, D.; Engels, M.; Hoogenboom, S.; Bolan, C.; Wallace, M.; Bagci, U. A critical appraisal of data augmentation methods for imaging-based medical diagnosis applications. arXiv 2022, arXiv:2301.02181. [Google Scholar] [CrossRef]
- Phan, T.H.; Yamamoto, K. Resolving class imbalance in object detection with weighted cross entropy losses. arXiv 2020, arXiv:2006.01413. [Google Scholar] [CrossRef]
- Hosseini, S.M.; Baghshah, M.S. Dilated balanced cross entropy loss for medical image segmentation. arXiv 2024, arXiv:2412.06045. [Google Scholar] [CrossRef]
- Hu, Z.; Mei, W.; Chen, H.; Hou, W. Multi-scale feature fusion and class weight loss for skin lesion classification. Comput. Biol. Med. 2024, 176, 108594. [Google Scholar] [CrossRef]
- Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning; PMLR: New York, NY, USA, 2019; pp. 6105–6114. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
- Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 16133–16142. [Google Scholar]
- Firnando, F.M.; Setiadi, D.R.I.M.; Muslikh, A.R.; Iriananda, S.W. Analyzing inceptionv3 and inceptionresnetv2 with data augmentation for rice leaf disease classification. J. Future Artif. Intell. Technol. 2024, 1, 1–11. [Google Scholar]
- Iandola, F.; Moskewicz, M.; Karayev, S.; Girshick, R.; Darrell, T.; Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids. arXiv 2014, arXiv:1404.1869. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 2818–2826. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 13733–13742. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2017; pp. 1251–1258. [Google Scholar]
- Tummala, S.; Kadry, S.; Bukhari, S.A.C.; Rauf, H.T. Classification of brain tumor from magnetic resonance imaging using vision transformers ensembling. Curr. Oncol. 2022, 29, 7498–7511. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Qi, X.; Wang, J.; Zhang, L. Disco-clip: A distributed contrastive loss for memory efficient clip training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 22648–22657. [Google Scholar]
- Bolya, D.; Fu, C.Y.; Dai, X.; Zhang, P.; Hoffman, J. Hydra attention: Efficient attention with many heads. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 35–49. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2021; pp. 10012–10022. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN model-based approach in classification. In Proceedings of the OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, Catania, Italy, 3–7 November 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
- Burges, C.J. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
- Lewis, D.D. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of the 10th European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1998; pp. 4–15. [Google Scholar]
- Faraggi, D.; Simon, R. A neural network model for survival data. Stat. Med. 1995, 14, 73–82. [Google Scholar] [CrossRef] [PubMed]
- Cramer, G.M.; Ford, R.A.; Hall, R.L. Estimation of toxic hazard—A decision tree approach. Food Cosmet. Toxicol. 1976, 16, 255–276. [Google Scholar] [CrossRef]
- Morgan, S.P.; Teachman, J.D. Logistic regression: Description, examples, and comparisons. J. Marriage Fam. 1988, 50, 929–936. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
- Reschke, R.; Enk, A.H.; Hassel, J.C. Prognostic biomarkers in evolving melanoma immunotherapy. Am. J. Clin. Dermatol. 2025, 26, 213–223. [Google Scholar] [CrossRef]
- Alwakid, G.; Gouda, W.; Humayun, M.; Sama, N.U. Melanoma detection using deep learning-based classifications. Healthcare 2022, 10, 2481. [Google Scholar] [CrossRef]
- Fraiwan, M.; Faouri, E. On the automatic detection and classification of skin cancer using deep transfer learning. Sensors 2022, 22, 4963. [Google Scholar] [CrossRef] [PubMed]
- Alam, M.J.; Mohammad, M.S.; Hossain, M.A.F.; Showmik, I.A.; Raihan, M.S.; Ahmed, S.; Mahmud, T.I. S2C-DeLeNet: A parameter transfer based segmentation-classification integration for detecting skin cancer lesions from dermoscopic images. Comput. Biol. Med. 2022, 150, 106148. [Google Scholar] [CrossRef] [PubMed]





| Class | Label | Number of Images |
|---|---|---|
| AKIEC | Actinic keratoses | 327 |
| BCC | Basal cell carcinoma | 514 |
| BKL | Benign keratosis-like | 1099 |
| DF | Dermatofibroma | 115 |
| MEL | Melanoma | 1113 |
| NV | Melanocytic nevi | 6705 |
| VASC | Vascular lesions | 142 |
| Parameter | Configuration |
|---|---|
| Optimization Algorithm | AdamW |
| Initial Learning Rate | |
| Learning Rate Scheduler | ReduceLROnPlateau (Factor: 0.1, Patience: 2) |
| Batch Size | 32 |
| Training Epochs | 5 |
| Loss Function | Class-Weighted Cross-Entropy |
| Input Image Size | 224 × 224 |
| Model | Accuracy (%) | Macro Recall (%) | Macro Precision (%) | Macro F1 Score (%) |
|---|---|---|---|---|
| EfficientNet-b0 | 88.1 | 79.3 | 76.8 | 77.8 |
| Resnet50 | 86.6 | 76.5 | 76.6 | 75.3 |
| ConvNeXt Base | 89.6 | 79.0 | 85.6 | 81.7 |
| Inception_Resnet_v2 | 87.9 | 77.5 | 82.0 | 78.9 |
| DenseNet 121 | 88.1 | 77.4 | 81.8 | 79.2 |
| Inception v3 | 85.9 | 78.3 | 73.7 | 75.2 |
| VGG19 | 82.5 | 64.1 | 67.7 | 64.5 |
| Xception | 88.0 | 79.7 | 80.6 | 79.8 |
| ViT B-16 | 81.0 | 60.6 | 65.6 | 62.2 |
| ViT B-32 | 82.0 | 67.0 | 69.8 | 67.9 |
| DeiT B-16 | 85.8 | 72.4 | 74.4 | 71.1 |
| Swin Base | 88.0 | 76.3 | 82.1 | 78.7 |
| Classifiers | Accuracy (%) | Macro Recall (%) | Macro Precision (%) | Macro F1 Score (%) | |
|---|---|---|---|---|---|
| Proposed Model | DT | 82.8 | 72.1 | 76.2 | 74.0 |
| LR | 92.7 | 91.9 | 88.7 | 90.1 | |
| NB | 83.0 | 91.9 | 81.4 | 85.2 | |
| SVM | 95.1 | 92.9 | 95.3 | 94.0 | |
| KNN | 94.2 | 92.2 | 94.3 | 93.2 | |
| NN | 94.2 | 93.0 | 93.9 | 93.4 |
| Papers | Methods | Number of Classes | Number of Images | Accuracy (%) |
|---|---|---|---|---|
| Hosny et al. [15] | CNN | 3 | 200 | 80 |
| Gouda et al. [16] | InceptionV3 | 2 | 3533 | 85.8 |
| Gururaj et al. [17] | DenseNet169 | 7 | 10,015 | 91.2 |
| Kousis et al. [19] | DenseNet169 | 7 | 10,015 | 92.25 |
| Tembhurne et al. [20] | VGG19 + LR | 2 | 3297 | 93 |
| Wang et al. [21] | VGG | 2 | 25,331 | 90.67 |
| Balambigai et al. [22] | CNN | 7 | 10,015 | 77.17 |
| Alwakid et al. [52] | CNN based models | 7 | 10,015 | 86 |
| Fraiwan and Faouri [53] | CNN architectures | 7 | 10,015 | 82.9 |
| Alam et al. [54] | S2C-DeLeNet | 7 | 10,015 | 91.03 |
| Proposed Model | CNN-ViT-Attention | 7 | 10,015 | 95.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kanat, Z.; Kesim Onal, M.; Bingol, H.; Sener, S.; Avci, E.; Yildirim, M. Automated Early Detection of Skin Cancer Using a CNN-ViT-Attention-Based Hybrid Model. Biomedicines 2026, 14, 583. https://doi.org/10.3390/biomedicines14030583
Kanat Z, Kesim Onal M, Bingol H, Sener S, Avci E, Yildirim M. Automated Early Detection of Skin Cancer Using a CNN-ViT-Attention-Based Hybrid Model. Biomedicines. 2026; 14(3):583. https://doi.org/10.3390/biomedicines14030583
Chicago/Turabian StyleKanat, Zekiye, Merve Kesim Onal, Harun Bingol, Serpil Sener, Engin Avci, and Muhammed Yildirim. 2026. "Automated Early Detection of Skin Cancer Using a CNN-ViT-Attention-Based Hybrid Model" Biomedicines 14, no. 3: 583. https://doi.org/10.3390/biomedicines14030583
APA StyleKanat, Z., Kesim Onal, M., Bingol, H., Sener, S., Avci, E., & Yildirim, M. (2026). Automated Early Detection of Skin Cancer Using a CNN-ViT-Attention-Based Hybrid Model. Biomedicines, 14(3), 583. https://doi.org/10.3390/biomedicines14030583

