Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study
Abstract
1. Introduction
2. Materials and Methods
2.1. Step 1: Dataset Source
2.2. Step 2: Data Preprocessing
2.3. Step 3: Balancing Dataset
2.4. Step 4: 10-Fold Cross-Validation
- MLP is a classifier that uses backpropagation to learn a multi-layer perceptron to classify instances.
- C4.5 develops a decision tree by splitting the value of the features at each node, including categorical and numerical features. The information gain is calculated, and the feature with the highest gain is used as the splitting rule.
- AdaBoost with C4.5, which is part of the group of ensemble methods, i.e., boosting, adds newly trained models in a series, where subsequent models focus on fixing the prediction errors made by previous models. In this study, we selected C4.5 as a base classifier.
- Bagging (bootstrap aggregation) with C4.5 is an ensemble skilled method that uses the bootstrap sampling technique to form different sample sets based on replacements. We used C4.5 as a base classifier to derive the forest.
2.5. Step 5: Interpretability
3. Results
3.1. Prediction Performance
3.2. Interpretability
4. Discussion
Limitations and Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
| Title | Authors | Published Year | Cancer | Input Data | Proposed, Benchmark Models | Result | 
|---|---|---|---|---|---|---|
| Application of DL to predict advanced neoplasia using big clinical data in colorectal cancer screening of asymptomatic adults | Yang, Hyo-Joon, et al. | 2021 | Colorectal | Clinical data | ANN/LR | AUC of LR (0.724), ANN (0.760) | 
| A data-driven approach to a chemotherapy recommendation model based on DL for patients with colorectal cancer in Korea | Park, Jin-Hyeok, et al. | 2020 | Colorectal | Clinical data | ANN | Concordance rates with the NCCN guidelines were 70.5% for Top-1 Accuracy and 84% for Top-2 Accuracy | 
| Preoperative prediction of lymph node metastasis in patients with early-T-stage non-small cell lung cancer by machine learning algorithms | Wu, Yijun, et al. | 2020 | Lung | Clinical data | AdaBoost, ANN, DT, GBDT, LR, MNB, RF, XGBoost | RF is the best model, AUC 0.89 | 
| Robust machine learning for colorectal cancer risk prediction and stratification | Nartowt, Bradley J., et al. | 2020 | Colorectal | Clinical data | ANN | Concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04 | 
| A Machine Learning Approach for Long-Term Prognosis of Bladder Cancer based on Clinical and Molecular Features | Song, Qingyuan, et al. | 2020 | Bladder | Clinical data | LR | AUC 0.77, F1 0.78, sensitivity 0.65, specificity 0.79, accuracy 0.76 | 
| Predicting breast cancer in Chinese women using machine learning techniques: algorithm development | Hou, Can, et al. | 2020 | Breast Cancer | Clinical data | XGBoost, RF and ANN | AUC of XGBoost (0.742), ANN (0.728), RF (0.728) | 
| Treatment stratification of patients with metastatic castration-resistant prostate cancer by machine learning | Deng, Kaiwen, Hongyang Li, and Yuanfang Guan. | 2020 | Prostate | Clinical data | Linear regression, LR, Cox regression, BAG-CART, RF | RF achieved the highest performance | 
| Patient classification of two-week wait referrals for suspected head and neck cancer: a machine learning approach | Moor, J. W., V. Paleri, and J. Edwards. | 2019 | Head and neck | Clinical data | ML algorithms | Variational logistic regression was the most clinically useful technique among benchmarks | 
| The performance of different artificial intelligence models in predicting breast cancer among individuals having type 2 diabetes mellitus | Hsieh, Meng-Hsuen, et al. | 2019 | Breast Cancer | Clinical data | LR, ANN and RF | AUC of the LR (0.834), ANN (0.865), and RF (0.959) | 
| Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity | Nitta, Satoshi, et al. | 2019 | Prostate | Clinical data | ANN, RF, SVM | AUC of ANN(0.69), RF(0.64), SVM(0.63) | 
| A machine learning-assisted decision-support model to better identify patients with prostate cancer requiring an extended pelvic lymph node dissection | Hou, Ying, et al. | 2019 | Prostate | Clinical data | LR, SVM, RF | AUCs (RFs+/RFs−: 0.906/0.885; SVM+/SVM−: 0.891/0.868; LR+/LR−: 0.886/0.882) with (+) or without (−) MRI-reported LNI | 
| Classifying lung cancer severity with ensemble machine learning in health care claims data | Bergquist, Savannah L., et al. | 2017 | Lung | Clinical data | Ensemble of ML models | Sensitivity 0.93, specificity 0.92, accuracy 0.93 | 
| A Novel Chaos-Based Privacy-Preserving Deep Learning Model for Cancer Diagnosis | Mujeeb Ur Rehman, Arslan Shafique, Yazeed Yasin Ghadi et al. | 2022 | Brain | Image | DL | Sensitivity 0.97, accuracy 0.972, F1 0.98 | 
| Imbalanced breast cancer classification using transfer learning | R. Singh, T. Ahmed, A. Kumar et al. | 2021 | Breast | Image | DL | Accuracy 0.903 | 
| Breast cancer diagnosis using deep belief networks on ROI images | G. Altan et al. | 2021 | Breast | Image | DL | Accuracy 0.963, specificity 0.967, sensitivity 0.96, precision 0.964 | 
| Automatic skin cancer detection in dermoscopy Imaging based on ensemble lightweight DL network | Wei, Lisheng, Kun Ding, and Huosheng Hu | 2020 | Skin | Image | DL | AUC 0.854, accuracy 0.876 | 
| A 3D Probabilistic Deep Learning System for Detection and Diagnosis of Lung Cancer Using Low-Dose CT Scans | O. Ozdemir, R. L. Russell, and A. A. Berlin | 2020 | Lung | Image | DL | AUC 0.869, sensitivity 0.921 | 
| Breast lesion classification based on dynamic contrast-enhanced magnetic resonance images sequences with long short-term memory networks | N. Antropova, B. Huynh, H. Li, and M. L. Giger | 2018 | Breast | Image | DL | AUC 0.88, accuracy 0.93 | 
| Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images | J. Xu et al. | 2016 | Breast | Image | DL | AUC 0.788, F1 0.85 | 
References
- Duarte, A.F.; Sousa-Pinto, B.; Haneke, E.; Correia, O. Risk factors for development of new skin neoplasms in patients with past history of skin cancer: A survival analysis. Sci. Rep. 2018, 8, 15744. [Google Scholar] [CrossRef] [PubMed]
- Ercal, F.; Chawla, A.; Stoecker, W.; Lee, H.-C.; Moss, R. Neural network diagnosis of malignant melanoma from color images. IEEE Trans. Biomed. Eng. 1994, 41, 837–845. [Google Scholar] [CrossRef]
- Ercal, F.; Lee, H.C.; Stoecker, W.V.; Moss, R.H. Skin Cancer Classification Using Neural Networks and Fuzzy Systems. Int. J. Smart Eng. Syst. Des. 1998, 1, 273–289. [Google Scholar]
- Schaefer, G.; Krawczyk, B.; Celebi, M.E.; Iyatomi, H. An ensemble classification approach for melanoma diagnosis. Memetic Comput. 2014, 6, 233–240. [Google Scholar] [CrossRef]
- Codella, N.; Cai, J.; Abedini, M.; Garnavi, R.; Halpern, A.; Smith, J.R. Deep Learning, Sparse Coding, and SVM for Melanoma Recognition in Dermoscopy Images. In Proceedings of the 6th International Workshop on Machine Learning in Medical Imaging, Munish, Germany, 5–9 October 2015; pp. 118–126. [Google Scholar]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
- Tajbakhsh, N.; Shin, J.Y.; Gurudu, S.R.; Hurst, R.T.; Kendall, C.B.; Gotway, M.B.; Liang, J. Convolutional neural networks for medical image analysis: “Full training or fine tuning?” . IEEE Trans. Med. Imaging 2016, 35, 1299–1312. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Jorgenson, E.; Ioannidis, N.M.; Asgari, M.M.; Whittemore, A.S. A Prediction Tool to Facilitate Risk-Stratified Screening for Squamous Cell Skin Cancer. J. Investig. Dermatol. 2018, 138, 2589–2594. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Salzberg, S.L. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 1994, 16, 235–240. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. Experiments with a New Boosting Algorithm. In Proceedings of the International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; pp. 148–156. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 2004, 24, 123–140. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.-W.; Newman, S.-F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef] [PubMed]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Chapman and Hall: New York, NY, USA, 1993. [Google Scholar]
- Krawczyk, B.; Woźniak, M.; Schaefer, G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 2014, 14, 554–562. [Google Scholar] [CrossRef]
- Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. A Comparative Study of Data Sampling and Cost Sensitive Learning. In Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, Pisa, Italy, 15–19 December 2008; pp. 46–52. [Google Scholar] [CrossRef]
- Thai-Nghe, N.; Gantner, Z.; Schmidt-Thieme, L. Cost-sensitive learning methods for imbalanced data. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
- Garcovich, S.; Colloca, G.; Sollena, P.; Andrea, B.; Balducci, L.; Cho, W.C.; Bernabei, R.; Peris, K. Skin Cancer Epidemics in the Elderly as An Emerging Issue in Geriatric Oncology. Aging Dis. 2017, 8, 643–661. [Google Scholar] [CrossRef]
- Nosrati, A.; Yu, W.Y.; McGuire, J.; Griffin, A.; de Souza, J.R.; Singh, R.; Linos, E.; Chren, M.M.; Grimes, B.; Jewell, N.P.; et al. Outcomes and Risk Factors in Patients with Multiple Primary Melanomas. J. Investig. Dermatol. 2018, 139, 195–201. [Google Scholar] [CrossRef]
- Albert, A.; Knoll, M.A.; Conti, J.A.; Zbar, R.I.S. Non-Melanoma Skin Cancers in the Older Patient. Curr. Oncol. Rep. 2019, 21, 79. [Google Scholar] [CrossRef]
- Yen, H.; Dhana, A.; Okhovat, J.; Qureshi, A.; Keum, N.; Cho, E. Alcohol intake and risk of nonmelanoma skin cancer: A systematic review and dose–response meta-analysis. Br. J. Dermatol. 2017, 177, 696–707. [Google Scholar] [CrossRef]
- Adams, G.J.; Goldstein, E.K.; Goldstein, B.G.; Jarman, K.L.; Goldstein, A.O. Attitudes and Behaviors That Impact Skin Cancer Risk among Men. Int. J. Environ. Res. Public Health 2021, 18, 9989. [Google Scholar] [CrossRef]
- Fontanillas, P.; Alipanahi, B.; Furlotte, N.A.; Johnson, M.; Wilson, C.H.; Pitts, S.J.; Gentleman, R.; Auton, A. Disease risk scores for skin cancers. Nat. Commun. 2021, 12, 160. [Google Scholar] [CrossRef]
- Chang, C.-C.; Huang, T.-H.; Shueng, P.-W.; Chen, S.-H.; Chen, C.-C.; Lu, C.-J.; Tseng, Y.-J. Developing a Stacked Ensemble-Based Classification Scheme to Predict Second Primary Cancers in Head and Neck Cancer Survivors. Int. J. Environ. Res. Public Health 2021, 18, 12499. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.C.; Chen, C.-C.; Cheewakriangkrai, C.; Chen, Y.C.; Yang, S.F. Risk Prediction of Second Primary Endometrial Cancer in Obese Women: A Hospital-Based Cancer Registry Study. Int. J. Environ. Res. Public Health 2021, 18, 8997. [Google Scholar] [CrossRef] [PubMed]
- Dusingize, J.C.; Olsen, C.M.; Pandeya, N.P.; Subramaniam, P.; Thompson, B.S.; Neale, R.E.; Green, A.C.; Whiteman, D.C. Cigarette Smoking and the Risks of Basal Cell Carcinoma and Squamous Cell Carcinoma. J. Investig. Dermatol. 2017, 137, 1700–1708. [Google Scholar] [CrossRef]
- Zavattaro, E.; Fava, P.; Veronese, F.; Cavaliere, G.; Ferrante, D.; Cantaluppi, V.; Ranghino, A.; Biancone, L.; Fierro, M.T.; Savoia, P. Identification of Risk Factors for Multiple Non-Melanoma Skin Cancers in Italian Kidney Transplant Recipients. Medicina 2019, 55, 279. [Google Scholar] [CrossRef]
- Savoia, P.; Veronese, F.; Camillo, L.; Tarantino, V.; Cremona, O.; Zavattaro, E. Multiple Basal Cell Carcinomas in Immunocompetent Patients. Cancers 2022, 14, 3211. [Google Scholar] [CrossRef] [PubMed]
- Wiemels, J.L.; Wiencke, J.K.; Li, Z.; Ramos, C.; Nelson, H.H.; Karagas, M.R. Risk of Squamous Cell Carcinoma of the Skin in Relation to IgE: A Nested Case–Control Study. Cancer Epidemiol. Biomark. Prev. 2011, 20, 2377–2383. [Google Scholar] [CrossRef] [PubMed]












| Items | Features | Rank | Encoding | Sample Size | 
|---|---|---|---|---|
| 1 | Gender | Female | 0 | 597 | 
| Male | 1 | 651 | ||
| 2 | Age at diagnosis | 17~98 | Integer | 1248 | 
| 3 | Tumor size | <5 cm | 0 | 1202 | 
| ≥5 cm | 1 | 46 | ||
| 4 | Regional lymph node involvement | No | 0 | 1140 | 
| Yes | 1 | 108 | ||
| 5 | Cancer stage | <(Stage I) | 0 | 818 | 
| ≥(Stage II) | 1 | 430 | ||
| 6 | Residual tumor on edge of surgery | No | 0 | 1223 | 
| Yes | 1 | 25 | ||
| 7 | Radiation Therapy | No | 0 | 1218 | 
| Yes | 1 | 30 | ||
| 8 | Systemic therapy | No | 0 | 1242 | 
| Yes | 1 | 6 | ||
| 9 | Areca | No/NA | 0 | 1200 | 
| Yes | 1 | 48 | ||
| 10 | Alcohol | No/NA | 0 | 1144 | 
| Yes | 1 | 104 | ||
| Category | Second primary skin cancer | No | 0 | 1146 | 
| Yes | 1 | 102 | 
| Algorithm | TP Rate | FP Rate | Precision | Recall | F1 Score | ROC Area | PRC Area | Acc. | Category | 
|---|---|---|---|---|---|---|---|---|---|
| MLP | 0.814 | 0.078 | 0.913 | 0.814 | 0.861 | 0.902 | 0.893 | 0.868 | non-SPSC | 
| 0.922 | 0.186 | 0.832 | 0.922 | 0.875 | 0.902 | 0.875 | SPSC | ||
| C4.5 | 0.837 | 0.051 | 0.942 | 0.837 | 0.886 | 0.916 | 0.906 | 0.893 | non-SPSC | 
| 0.949 | 0.163 | 0.853 | 0.949 | 0.898 | 0.916 | 0.891 | SPSC | ||
| AdaBoost C4.5 | 0.846 | 0.051 | 0.944 | 0.846 | 0.892 | 0.931 | 0.916 | 0.898 | non-SPSC | 
| 0.949 | 0.154 | 0.861 | 0.949 | 0.903 | 0.931 | 0.92 | SPSC | ||
| Bagging C4.5 | 0.842 | 0.051 | 0.942 | 0.842 | 0.889 | 0.927 | 0.915 | 0.895 | non-SPSC | 
| 0.949 | 0.158 | 0.857 | 0.949 | 0.901 | 0.927 | 0.911 | SPSC | ||
| Random Forest | 0.853 | 0.048 | 0.947 | 0.853 | 0.897 | 0.939 | 0.932 | 0.902 | non-SPSC | 
| 0.952 | 0.147 | 0.866 | 0.952 | 0.907 | 0.939 | 0.926 | SPSC | 
| Cost of FN | TP Rate | FP Rate | Precision | Recall | F1 Score | ROC Area | PRC Area | Acc. | Category | 
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.853 | 0.048 | 0.947 | 0.853 | 0.897 | 0.939 | 0.932 | 0.902 | non-SPSC | 
| 0.952 | 0.147 | 0.866 | 0.952 | 0.907 | 0.939 | 0.926 | SPSC | ||
| 3 | 0.784 | 0.045 | 0.946 | 0.784 | 0.857 | 0.937 | 0.932 | 0.870 | non-SPSC | 
| 0.955 | 0.216 | 0.815 | 0.955 | 0.880 | 0.937 | 0.923 | SPSC | ||
| 5 | 0.728 | 0.041 | 0.947 | 0.728 | 0.823 | 0.938 | 0.932 | 0.843 | non-SPSC | 
| 0.959 | 0.272 | 0.779 | 0.959 | 0.860 | 0.938 | 0.924 | SPSC | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, H.-C.; Lin, T.-C.; Chang, C.-C.; Lu, Y.-C.A.; Lee, C.-M.; Purevdorj, B. Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study. Appl. Sci. 2022, 12, 12520. https://doi.org/10.3390/app122412520
Lee H-C, Lin T-C, Chang C-C, Lu Y-CA, Lee C-M, Purevdorj B. Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study. Applied Sciences. 2022; 12(24):12520. https://doi.org/10.3390/app122412520
Chicago/Turabian StyleLee, Hsi-Chieh, Tsung-Chieh Lin, Chi-Chang Chang, Yen-Chiao Angel Lu, Chih-Min Lee, and Bolormaa Purevdorj. 2022. "Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study" Applied Sciences 12, no. 24: 12520. https://doi.org/10.3390/app122412520
APA StyleLee, H.-C., Lin, T.-C., Chang, C.-C., Lu, Y.-C. A., Lee, C.-M., & Purevdorj, B. (2022). Clinical Risk Factor Prediction for Second Primary Skin Cancer: A Hospital-Based Cancer Registry Study. Applied Sciences, 12(24), 12520. https://doi.org/10.3390/app122412520
 
        





 
       