Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction †
Abstract
1. Introduction
- We present a reproducible preprocessing pipeline with patient-level data splitting, data augmentation, and Borderline-SMOTE balancing to address class imbalance and data leakage issues.
- We develop a compact ConvMixer-based feature extractor integrated with XGBoost, LightGBM, and CatBoost classifiers, systematically comparing their performance and calibration.
- We analyze the demographic and anatomical characteristics of PAD-UFES-20 to interpret how gender, body region, and age distributions influence model performance.
- We provide an interpretable analysis using Grad-CAM and SHAP visualizations, linking model attention regions and feature importance to clinically relevant attributes.
2. Materials and Methods
2.1. Dataset Description
2.2. Data Preprocessing and Augmentation
2.3. Data Partitioning and Validation Strategy
2.4. Feature Extraction Using ConvMixer
2.5. Classification Using Gradient Boosting Models
2.6. Evaluation Metrics & Implementation Details
3. Results
3.1. Dataset Characteristics
3.2. Overall Classification Performance
3.3. Per-Class Results
3.4. Effect of Class-Balancing and Validation Strategy
3.5. Interpretability and Calibration
4. Discussion
4.1. Dataset Influence and Model Performance
4.2. Demographic Insights and Clinical Relevance
4.3. Limitations and Future Work
4.4. Summary of Implications
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Roky, A.H.; Islam, M.M.; Ahasan, A.M.F.; Mostaq, M.S.; Mahmud, M.Z.; Amin, M.N.; Mahmud, M.A. Overview of Skin Cancer Types and Prevalence Rates Across Continents. Cancer Pathog. Ther. 2025, 3, 89–100. [Google Scholar] [CrossRef] [PubMed]
- Huang, S.; Jiang, J.; Wong, H.; Zhu, P.; Ji, X.; Wang, D. Global Burden and Prediction Study of Cutaneous Squamous Cell Carcinoma from 1990 to 2030: A Systematic Analysis and Comparison with China. J. Glob. Health 2024, 14, 04093. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Zhong, Y.; Han, L.; Xie, Y.; Wan, M. Global, Regional, and National Trends in the Burden of Melanoma and Non-Melanoma Skin Cancer: Insights from the Global Burden of Disease Study 1990–2021. Sci. Rep. 2025, 15, 5996. [Google Scholar] [CrossRef] [PubMed]
- Kibriya, H.; Siddiqa, A.; Khan, W.Z. Melanoma Lesion Localization Using UNet and Explainable AI. Neural Comput. Appl. 2025, 37, 10175–10196. [Google Scholar] [CrossRef]
- Yang, G.; Luo, S.; Greer, P. Advancements in Skin Cancer Classification: A Review of Machine Learning Techniques in Clinical Image Analysis. Multimed. Tools Appl. 2025, 84, 9837–9864. [Google Scholar] [CrossRef]
- Babar, W.; Ali, R.H.; Faheem, A.; Mansoor, S.A. Using Convolutional Neural Networks for Enhanced Pneumonia Detection via Chest X-Rays. In Proceedings of the 2024 International Conference on IT and Industrial Technologies (ICIT), Islamabad, Pakistan, 10–12 December 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Ishaq, M.H.; Ali, R.H.; Koutaly, R.; Khan, T.A.; Ahmad, I. Enhanced Biometric Security through Infrared Vein Pattern Recognition. In Proceedings of the 2025 International Conference on Innovation in Artificial Intelligence and Internet of Things (AIIT), Berlin, Germany, 29–30 September 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Khan, A.; Ali, R.H.; Akmal, U.; Ramazan, A. ASL Recognition Using Deep Learning Algorithms. In Proceedings of the 2024 International Conference on IT and Industrial Technologies (ICIT), Islamabad, Pakistan, 10–12 December 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Singh, J.; Sandhu, J.K.; Kumar, Y. An Analysis of Detection and Diagnosis of Different Classes of Skin Diseases Using Artificial Intelligence-Based Learning Approaches with Hyper Parameters. In Archives of Computational Methods in Engineering; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1051–1078. [Google Scholar] [CrossRef]
- Van Thanh, H.; Quang, N.D.; Phuong, T.M.; Jo, K.-H.; Hoang, V.-D. A Compact Version of EfficientNet for Skin Disease Diagnosis Application. Neurocomputing 2025, 620, 129166. [Google Scholar] [CrossRef]
- Hassanie, S.; Gohar, A.; Ali, R.H.; Khan, T.A.; Ahmed, I.; Faiz, S. A Scalable AI Approach to Bird Species Identification for Conservation and Ecological Planning. IEEE Access 2025, 13, 159859–159871. [Google Scholar] [CrossRef]
- Krishna, G.S.; Supriya, K.; Sorgile, M. LesionAid: Vision Transformers-Based Skin Lesion Generation and Classification—A Practical Review. Multimed. Tools Appl. 2025, 84, 41405–41442. [Google Scholar] [CrossRef]
- Fırat, H.; Üzen, H. DXDSENet-CM Model: An Ensemble Learning Model Based on Depthwise Squeeze-and-Excitation ConvMixer Architecture for the Classification of Multi-Class Skin Lesions. Multimed. Tools Appl. 2025, 84, 9903–9938. [Google Scholar] [CrossRef]
- Ileri, K. Comparative Analysis of CatBoost, LightGBM, XGBoost, RF, and DT Methods Optimised with PSO to Estimate the Number of k-Barriers for Intrusion Detection in Wireless Sensor Networks. Int. J. Mach. Learn. Cybern. 2025, 16, 6937–6956. [Google Scholar] [CrossRef]
- Jain, E.; Singh, A. Optimizing Gradient Boosting Algorithms for Obesity Risk Prediction: A Comparative Analysis of XGBoost, LightGBM, and CatBoost Models. In Proceedings of the 2024 International Conference on Cybernation and Computation (CYBERCOM), New Delhi, India, 15–16 November 2024; pp. 320–324. [Google Scholar] [CrossRef]
- Zhang, L.; Jánošík, D. Enhanced Short-Term Load Forecasting with Hybrid Machine Learning Models: CatBoost and XGBoost Approaches. Expert Syst. Appl. 2024, 241, 122686. [Google Scholar] [CrossRef]
- Pacheco, A.G.C.; Lima, G.R.; Salomão, A.S.; Krohling, B.; Biral, I.P.; de Angelo, G.G.; Alves, F.C.R., Jr.; Esgario, J.G.M.; Simora, A.C.; Castro, P.B.C.; et al. PAD-UFES-20: A Skin Lesion Dataset Composed of Patient Data and Clinical Images Collected from Smartphones. Data Brief 2020, 32, 106221. [Google Scholar] [CrossRef] [PubMed]




| Model | Balancing | Macro AUC | Macro F1 | MEL Sens. |
|---|---|---|---|---|
| XGBoost | Class Weights | 0.91 | 0.84 | 0.86 |
| CatBoost | Borderline-SMOTE | 0.94 | 0.88 | 0.91 |
| LightGBM | Class Weights | 0.90 | 0.82 | 0.85 |
| Class | Precision | Recall | F1 | AUC |
|---|---|---|---|---|
| AKIEC | 0.89 | 0.87 | 0.88 | 0.93 |
| BCC | 0.90 | 0.86 | 0.88 | 0.92 |
| BKL | 0.88 | 0.89 | 0.88 | 0.95 |
| DF | 0.83 | 0.81 | 0.82 | 0.91 |
| MEL | 0.93 | 0.91 | 0.92 | 0.96 |
| NV | 0.89 | 0.87 | 0.88 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Javed, D.; Arshad, U.; Irfan, H.; Ali, R.H.; Khan, T.A. Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction. Eng. Proc. 2025, 87, 115. https://doi.org/10.3390/engproc2025087115
Javed D, Arshad U, Irfan H, Ali RH, Khan TA. Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction. Engineering Proceedings. 2025; 87(1):115. https://doi.org/10.3390/engproc2025087115
Chicago/Turabian StyleJaved, Danish, Usama Arshad, Haider Irfan, Raja Hashim Ali, and Talha Ali Khan. 2025. "Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction" Engineering Proceedings 87, no. 1: 115. https://doi.org/10.3390/engproc2025087115
APA StyleJaved, D., Arshad, U., Irfan, H., Ali, R. H., & Khan, T. A. (2025). Evaluation of Boosting Algorithms for Skin Cancer Classification Using the PAD-UFES-20 Dataset and Custom CNN Feature Extraction. Engineering Proceedings, 87(1), 115. https://doi.org/10.3390/engproc2025087115

