Machine Learning Models for Pancreatic Cancer Survival Prediction: A Multi-Model Analysis Across Stages and Treatments Using the Surveillance, Epidemiology, and End Results (SEER) Database
Abstract
1. Background
2. Materials and Methods
2.1. Study Setting and Data Description
- If there is any statistically significant difference in median survival times among the race groups [15].
- If there is any statistically significant difference in median survival times among the age groups.
- If there is any statistically significant difference in median survival times between the gender groups.
- If there is any statistically significant difference in median survival times among the cancer stages of patients who underwent only chemotherapy, who underwent only radiation, and who underwent a combination of chemotherapy and radiation.
2.2. Developing Predictive Models for Classifying Patients into Risk Categories and Performing Survival Analysis
3. Results
3.1. Hypotheses Testing for Specific Attributes
3.2. Identifying the Analytical Forms of the Survival Times Across the Cancer Stages for Different Treatment Options
3.3. Classifying Patients into the Risk Groups Using Predictive Models
3.4. Comparing the Traditional Survival Models with the Predictive Survival Models
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GEV | Generalized Extreme Value |
GOF | Goodness-of-fit |
GBM | Gradient Boosted Machine |
EN | Elastic net |
RF | Random forest |
NN | Neural network |
KM | Kaplan–Meier |
CPH | Cox Proportional Hazard |
SVM | Support Vector Machine |
VIPs | Variable Importance Plots |
ML | Machine learning |
DL | Deep learning |
References
- Michaud, D. Epidemiology of pancreatic cancer. Minerva Chir. 2004, 59, 99–111. [Google Scholar] [PubMed]
- Li, D.; Xie, K.; Wolff, R.; Abbruzzese, J.L. Pancreatic cancer. Lancet 2004, 363, 1049–1057. [Google Scholar] [CrossRef]
- Vincent, A.; Herman, J.; Schulick, R.; Hruban, R.H.; Goggins, M. Pancreatic cancer. Lancet 2011, 378, 607–620. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, A.; Tsokos, C. Survival Analysis for Pancreatic Cancer Patients using Cox-Proportional Hazard (CPH) Model. Glob. J. Med. Res. 2021, 21, 29–46. [Google Scholar] [CrossRef]
- Chakraborty, A.; Tsokos, C. Parametric and Non-Parametric Survival Analysis of Patients with Acute Myeloid Leukemia (AML). Open J. Appl. Sci. 2021, 11, 126–148. [Google Scholar] [CrossRef]
- Zell, J.A.; Rhee, J.M.; Ziogas, A.; Lipkin, S.M.; Anton-Culver, H. Race, socioeconomic status, treatment, and survival time among pancreatic cancer cases in California. Cancer Epidemiol. Biomark. Prev. 2007, 16, 546–552. [Google Scholar] [CrossRef] [PubMed]
- Baek, B.; Lee, H. Prediction of survival and recurrence in patients with pancreatic cancer by integrating multi-omics data. Sci. Rep. 2020, 10, 18951. [Google Scholar] [CrossRef]
- Dong, H.; Yao, J.; Tang, Y.; Yuan, M.; Xia, Y.; Zhou, J.; Lu, H.; Zhou, J.; Dong, B.; Lu, L.; et al. Improved prognostic prediction of pancreatic Cancer using multi-phase CT by integrating neural distance and texture-aware transformer. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer Nature: Cham, Switzerland, 2023; pp. 241–251. [Google Scholar]
- Di Federico, A.; Tateo, V.; Parisi, C.; Formica, F.; Carloni, R.; Frega, G.; Rizzo, A.; Ricci, D.; Di Marco, M.; Palloni, A.; et al. Hacking pancreatic cancer: Present and future of personalized medicine. Pharmaceuticals 2021, 14, 677. [Google Scholar] [CrossRef]
- Suresh, K.; Severn, C.; Ghosh, D. Survival prediction models: An introduction to discrete-time modeling. BMC Med. Res. Methodol. 2022, 22, 207. [Google Scholar] [CrossRef]
- Harrell, F.E.; Harrell, F.E. Parametric survival models. In Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis; Springer: Cham, Switzerland, 2001; pp. 413–442. [Google Scholar]
- American Society of Clinical Oncology. Pancreatic Cancer: Risk Factors. Cancer.Net. Available online: https://www.cancer.net/cancer-types/pancreatic-cancer/risk-factors (accessed on 25 January 2025).
- Madan, D.B. Estimating parametric models of probability distributions. Methodol. Comput. Appl. Probab. 2015, 17, 823–831. [Google Scholar] [CrossRef]
- Vaisakh, K.M.; Xavier, T.; Sreedevi, E.P. Goodness of fit test for Rayleigh distribution with censored observations. J. Korean Stat. Soc. 2023, 52, 794–815. [Google Scholar] [CrossRef]
- Hayward, J.; Alvarez, S.A.; Ruiz, C.; Sullivan, M.; Tseng, J.; Whalen, G. Machine learning of clinical performance in a pancreatic cancer database. Artif. Intell. Med. 2010, 49, 187–195. [Google Scholar] [CrossRef] [PubMed]
- Rayner, J.C.W.; Livingston, G., Jr. The Kruskal–Wallis tests are Cochran–Mantel–Haenszel mean score tests. Metron 2020, 78, 353–360. [Google Scholar] [CrossRef]
- VanderWeele, T.J.; Mathur, M.B. Some desirable properties of the Bonferroni correction: Is the Bonferroni correction really so bad? Am. J. Epidemiol. 2019, 188, 617–618. [Google Scholar] [CrossRef] [PubMed]
- Rolke, W.; Gongora, C.G. A chi-square goodness-of-fit test for continuous distributions against a known alternative. Comput. Stat. 2021, 36, 1885–1900. [Google Scholar] [CrossRef]
- Omekam, I. Goodness of Fit Tests for Some Generalized Distributions. FUW Trends Sci. Technol. J. 2020, 5, 241–246. [Google Scholar]
- Abd El-Raheem, A.E.R.M.; Hosny, M.; Abd-Elfattah, E.F. Statistical inference of the class of nonparametric tests for the panel count and current status data from the perspective of the saddlepoint approximation. J. Math. 2023, 2023, 9111653. [Google Scholar] [CrossRef]
- Rypkema, D.; Tuljapurkar, S. Modeling extreme climatic events using the generalized extreme value (GEV) distribution. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2021; Volume 44, pp. 39–71. [Google Scholar]
- Singirankabo, E.; Iyamuremye, E. Modelling extreme rainfall events in Kigali city using generalized Pareto distribution. Meteorol. Appl. 2022, 29, e2076. [Google Scholar] [CrossRef]
- Sharma, M.; Bundele, M.; Bothale, V.; Nawal, M. Fine-tuned Predictive Model for Verifying POI Data. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 692–704. [Google Scholar] [CrossRef]
- Kushwah, J.; Singh, D. Classification of cancer gene selection using random forest and neural network based ensemble classifier. Int. J. Adv. Comput. Res. 2013, 3, 30–34. [Google Scholar]
- Lee, S.W. Kaplan-Meier and Cox proportional hazards regression in survival analysis: Statistical standard and guideline of Life Cycle Committee. Life Cycle 2023, 3, e8. [Google Scholar] [CrossRef]
- Lie, S.A.; Fenstad, A.M.; Lygre, S.H.L.; Kroken, G.; Dybvik, E.; Gjertsen, J.E.; Hallan, G.; Dale, H.; Furnes, O. Kaplan-meier and Cox regression are preferable for the analysis of time to revision of joint arthroplasty: Thirty-one years of follow-up for cemented and uncemented THAs inserted from 1987 to 2000 in the Norwegian arthroplasty register. JBJS Open Access 2022, 7, e21.00108. [Google Scholar] [CrossRef] [PubMed]
- Andrade, C. Survival analysis, Kaplan-Meier curves, and Cox regression: Basic concepts. Indian J. Psychol. Med. 2023, 45, 02537176231176986. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, A.; Tsokos, C.P. An AI-driven Predictive Model for Pancreatic Cancer Patients Using Extreme Gradient Boosting. J. Stat. Theory Appl. 2023, 22, 262–282. [Google Scholar] [CrossRef]
- Jobst, L.J.; Bader, M.; Moshagen, M. A tutorial on assessing statistical power and determining sample size for structural equation models. Psychol. Methods 2023, 28, 207. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, A.; Tsokos, C.P. A Modern Analytical Approach for Assessing the Treatment Effectiveness of Pancreatic Adenocarcinoma Patients Belonging to Different Demographics and Cancer Stages. J. Cancer Res. Treat. 2023, 11, 13–18. [Google Scholar] [CrossRef]
- Chakraborty, A. Data-Driven Analytical Predictive Modeling for Pancreatic Cancer, Financial & Social Systems. Ph.D. Dissertation, University of South Florida, Tampa, FL, USA, 2022. [Google Scholar]
- Chakraborty, A.; Tsokos, C. A Stock Optimization Problem in Finance: Understanding Financial and Economic Indicators through Analytical Predictive Modeling. Mathematics 2024, 12, 2407. [Google Scholar] [CrossRef]
- Bourgeois, A.; Horrill, T.; Mollison, A.; Stringer, E.; Lambert, L.K.; Stajduhar, K. Barriers to cancer treatment for people experiencing socioeconomic disadvantage in high-income countries: A scoping review. BMC Health Serv. Res. 2024, 24, 670. [Google Scholar] [CrossRef]
- Kim, H.; Park, T.; Jang, J.; Lee, S. Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models. Genom. Inform. 2022, 20, e23. [Google Scholar] [CrossRef]
- Biddell, C.B.; Spees, L.P.; Mayer, D.K.; Wheeler, S.B.; Trogdon, J.G.; Rotter, J.; Birken, S.A. Developing personalized survivorship care pathways in the United States: Existing resources and remaining challenges. Cancer 2020, 127, 997–1004. [Google Scholar] [CrossRef]
Category | KW Test Statistic | p-Value |
---|---|---|
Race | 0.68 | 0.71 |
Age | 291.50 | <2.2 × 10−16 |
Gender | 0.50 | 0.48 |
Cancer stages (chemotherapy only) | 177.73 | <2.2 × 10−16 |
Cancer stages (radiation only) | 21.24 | 9.4 × 10−5 |
Cancer stages (chemotherapy and radiation) | 61.80 | 2.4 × 10−13 |
Category | p-Values | ||
---|---|---|---|
Stage 1 and 2 | Stage 2 and 3 | Stage 3 and 4 | |
Cancer stages (chemotherapy only) | 0.20 | 0.01 | 2.2 × 10−7 |
Cancer stages (radiation only) | 0.63 | 1.00 | 0.14 |
Cancer stages (chemotherapy and radiation) | 0.28 | 0.04 | 3 × 10−6 |
Category | Stages | |||
---|---|---|---|---|
Stage 1 | Stage 2 | Stage 3 | Stage 4 | |
(Chemotherapy only) | Gen. Pareto | Gen. Extreme Value | Gen. Extreme Value | Gen. Pareto |
(Radiation only) | Gen. Extreme Value | Gen. Extreme Value | ||
(Chemotherapy and radiation) | Log-Pearson 3 | Gen. Extreme Value | Gen. Extreme Value | Gen. Extreme Value |
Time (Months) | Survival Probabilities | ||||||
---|---|---|---|---|---|---|---|
KM | GEV | CPH | EN | RF | NN | GBM | |
2 | 0.88 | 0.91 | 0.70 | 0.76 | 0.89 | 0.84 | 0.92 |
7 | 0.48 | 0.52 | 0.48 | 0.55 | 0.37 | 0.62 | 0.78 |
10 | 0.32 | 0.34 | 0.33 | 0.37 | 0.14 | 0.41 | 0.70 |
12 | 0.25 | 0.26 | 0.25 | 0.28 | 0.00 | 0.35 | 0.65 |
15 | 0.18 | 0.18 | 0.19 | 0.20 | 0.00 | 0.29 | 0.58 |
20 | 0.09 | 0.10 | 0.10 | 0.14 | 0.00 | 0.23 | 0.52 |
25 | 0.06 | 0.06 | 0.06 | 0.10 | 0.00 | 0.17 | 0.47 |
34 | 0.03 | 0.03 | 0.03 | 0.06 | 0.00 | 0.14 | 0.39 |
48 | 0.01 | 0.01 | 0.01 | 0.03 | 0.00 | 0.13 | 0.33 |
51 | 0.01 | 0.01 | 0.01 | 0.02 | 0.00 | 0.12 | 0.32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chakraborty, A.; Pant, M.D. Machine Learning Models for Pancreatic Cancer Survival Prediction: A Multi-Model Analysis Across Stages and Treatments Using the Surveillance, Epidemiology, and End Results (SEER) Database. J. Clin. Med. 2025, 14, 4686. https://doi.org/10.3390/jcm14134686
Chakraborty A, Pant MD. Machine Learning Models for Pancreatic Cancer Survival Prediction: A Multi-Model Analysis Across Stages and Treatments Using the Surveillance, Epidemiology, and End Results (SEER) Database. Journal of Clinical Medicine. 2025; 14(13):4686. https://doi.org/10.3390/jcm14134686
Chicago/Turabian StyleChakraborty, Aditya, and Mohan D. Pant. 2025. "Machine Learning Models for Pancreatic Cancer Survival Prediction: A Multi-Model Analysis Across Stages and Treatments Using the Surveillance, Epidemiology, and End Results (SEER) Database" Journal of Clinical Medicine 14, no. 13: 4686. https://doi.org/10.3390/jcm14134686
APA StyleChakraborty, A., & Pant, M. D. (2025). Machine Learning Models for Pancreatic Cancer Survival Prediction: A Multi-Model Analysis Across Stages and Treatments Using the Surveillance, Epidemiology, and End Results (SEER) Database. Journal of Clinical Medicine, 14(13), 4686. https://doi.org/10.3390/jcm14134686