Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Source and Selection
2.2. Training and Testing: Division of Data for Model Evaluation
3. Results
3.1. Influential Factors in Cancer Incidence
3.2. Advancements in ML Methods and Prediction Accuracy
4. Discussion
4.1. Consistent Findings: Aligning with Previous Studies on Aging and Cancer Rates
4.2. Revealing Menacing Cancer Mortality Rates
4.3. Promising Direction: ML Models for Cancer Incidence and Mortality Prediction
4.4. Comparative Study of Prediction Models
5. Conclusions
Supplementary Materials
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Siegel, R.L.; Miller, K.D.; Wagle, N.S.; Jemal, A. Cancer statistics, 2023. CA A Cancer J. Clin. 2023, 73, 17–48. [Google Scholar] [CrossRef] [PubMed]
- Qawoogha, S.S.; Shahiwala, A. Identification of potential anticancer phytochemicals against colorectal cancer by structure-based docking studies. J. Recept. Signal Transduct. 2020, 40, 67–76. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Meng, Z.; Wu, X.; Zhang, M.; Zhang, S.; Jin, T. Mortalin promotes breast cancer malignancy. Exp. Mol. Pathol. 2021, 118, 104593. [Google Scholar] [CrossRef] [PubMed]
- Kolonel, L.N.; Altshuler, D.; Henderson, B.E. The multiethnic cohort study: Exploring genes, lifestyle and cancer risk. Nat. Rev. Cancer 2004, 4, 519–527. [Google Scholar] [CrossRef] [PubMed]
- Ferlay, J.; Colombet, M.; Soerjomataram, I.; Mathers, C.; Parkin, D.M.; Piñeros, M.; Znaor, A.; Bray, F. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer 2019, 144, 1941–1953. [Google Scholar] [CrossRef] [PubMed]
- Tyrer, J.; Duffy, S.W.; Cuzick, J. A breast cancer prediction model incorporating familial and personal risk factors. Stat. Med. 2004, 23, 1111–1130. [Google Scholar] [CrossRef] [PubMed]
- Antoniou, A.C.; Easton, D.F. Risk prediction models for familial breast cancer. Future Oncol. 2006, 2, 257–274. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Ding, Y. Machine learning and its applications in studying the geographical distribution of ants. Diversity 2022, 14, 706. [Google Scholar] [CrossRef]
- Chen, S.; Ding, Y. A Machine Learning Approach to Predicting Academic Performance in Pennsylvania’s Schools. Soc. Sci. 2023, 12, 118. [Google Scholar] [CrossRef]
- Chen, S.; Ding, Y.; Liu, X. Development of the growth mindset scale: Evidence of structural validity, measurement model, direct and indirect effects in Chinese samples. Curr. Psychol. 2021, 42, 1712–1726. [Google Scholar] [CrossRef]
- Wabartha, M.; Durand, A.; Francois-Lavet, V.; Pineau, J. Handling black swan events in deep learning with diversely extrapolated neural networks. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, 7–15 January 2021; pp. 2140–2147. [Google Scholar]
- Cruz, J.A.; Wishart, D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2006, 2, 117693510600200030. [Google Scholar] [CrossRef]
- Wang, G.; Lam, K.-M.; Deng, Z.; Choi, K.-S. Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques. Comput. Biol. Med. 2015, 63, 124–132. [Google Scholar] [CrossRef] [PubMed]
- Erickson, B.J.; Kitamura, F. Magician’s corner: 9. Performance metrics for machine learning models. Radiol. Artif. Intell. 2021, 3, e200126. [Google Scholar] [CrossRef] [PubMed]
- Islam, M.M.; Haque, M.R.; Iqbal, H.; Hasan, M.M.; Hasan, M.; Kabir, M.N. Breast cancer prediction: A comparative study using machine learning techniques. SN Comput. Sci. 2020, 1, 290. [Google Scholar] [CrossRef]
- Jemal, A.; Bray, F.; Center, M.M.; Ferlay, J.; Ward, E.; Forman, D. Global cancer statistics. CA Cancer J. Clin. 2011, 61, 69–90. [Google Scholar] [CrossRef] [PubMed]
- Singletary, S.E. Rating the risk factors for breast cancer. Ann. Surg. 2003, 237, 474. [Google Scholar] [CrossRef] [PubMed]
- Vogelstein, B.; Papadopoulos, N.; Velculescu, V.E.; Zhou, S.; Diaz Jr, L.A.; Kinzler, K.W. Cancer genome landscapes. Science 2013, 339, 1546–1558. [Google Scholar] [CrossRef] [PubMed]
- Preston-Martin, S.; Pike, M.C.; Ross, R.K.; Jones, P.A.; Henderson, B.E. Increased cell division as a cause of human cancer. Cancer Res. 1990, 50, 7415–7421. [Google Scholar]
- Rubin, J.B.; Lagas, J.S.; Broestl, L.; Sponagel, J.; Rockwell, N.; Rhee, G.; Rosen, S.F.; Chen, S.; Klein, R.S.; Imoukhuede, P. Sex differences in cancer mechanisms. Biol. Sex Differ. 2020, 11, 17. [Google Scholar] [CrossRef]
- Marasco, V.; Carniti, C.; Guidetti, A.; Farina, L.; Magni, M.; Miceli, R.; Calabretta, L.; Verderio, P.; Ljevar, S.; Serpenti, F. T-cell immune response after mRNA SARS-CoV-2 vaccines is frequently detected also in the absence of seroconversion in patients with lymphoid malignancies. Br. J. Haematol. 2022, 196, 548–558. [Google Scholar] [CrossRef]
- Mellman, I.; Coukos, G.; Dranoff, G. Cancer immunotherapy comes of age. Nature 2011, 480, 480–489. [Google Scholar] [CrossRef] [PubMed]
- Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed]
- Berglund, A.; Matta, J.; Encarnación-Medina, J.; Ortiz-Sanchéz, C.; Dutil, J.; Linares, R.; Marcial, J.; Abreu-Takemura, C.; Moreno, N.; Putney, R. Dysregulation of DNA Methylation and Epigenetic Clocks in Prostate Cancer among Puerto Rican Men. Biomolecules 2022, 12, 2. [Google Scholar] [CrossRef] [PubMed]
- Spieker, A.J.; Gordetsky, J.B.; Maris, A.S.; Dehan, L.M.; Denney, J.E.; Arnold Egloff, S.A.; Scarpato, K.; Barocas, D.A.; Giannico, G.A. PTEN expression and morphological patterns in prostatic adenocarcinoma. Histopathology 2021, 79, 1061–1071. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Konstantinov, S.R.; Smits, R.; Peppelenbosch, M.P. Bacterial biofilms in colorectal cancer initiation and progression. Trends Mol. Med. 2017, 23, 18–30. [Google Scholar] [CrossRef] [PubMed]
- Mirzaei, R.; Sabokroo, N.; Ahmadyousefi, Y.; Motamedi, H.; Karampoor, S. Immunometabolism in biofilm infection: Lessons from cancer. Mol. Med. 2022, 28, 10. [Google Scholar] [CrossRef] [PubMed]
- Parsonnet, J. Bacterial infection as a cause of cancer. Environ. Health Perspect. 1995, 103, 263–268. [Google Scholar] [PubMed]
- Uemura, N.; Okamoto, S.; Yamamoto, S.; Matsumura, N.; Yamaguchi, S.; Yamakido, M.; Taniyama, K.; Sasaki, N.; Schlemper, R.J. Helicobacter pylori infection and the development of gastric cancer. New Engl. J. Med. 2001, 345, 784–789. [Google Scholar] [CrossRef] [PubMed]
- Cai, T.; Santi, R.; Tamanini, I.; Galli, I.C.; Perletti, G.; Bjerklund Johansen, T.E.; Nesi, G. Current knowledge of the potential links between inflammation and prostate cancer. Int. J. Mol. Sci. 2019, 20, 3833. [Google Scholar] [CrossRef]
- Cai, T.; Tessarolo, F.; Caola, I.; Piccoli, F.; Nollo, G.; Caciagli, P.; Mazzoli, S.; Palmieri, A.; Verze, P.; Malossini, G. Prostate calcifications: A case series supporting the microbial biofilm theory. Investig. Clin. Urol. 2018, 59, 187–193. [Google Scholar] [CrossRef]
- Chudzik-Rząd, B.; Zalewski, D.; Kasela, M.; Sawicki, R.; Szymańska, J.; Bogucka-Kocka, A.; Malm, A. The Landscape of Gene Expression during Hyperfilamentous Biofilm Development in Oral Candida albicans Isolated from a Lung Cancer Patient. Int. J. Mol. Sci. 2023, 24, 368. [Google Scholar] [CrossRef] [PubMed]
- Vijayakumar, S.; Vaseeharan, B.; Malaikozhundan, B.; Gopi, N.; Ekambaram, P.; Pachaiappan, R.; Velusamy, P.; Murugan, K.; Benelli, G.; Kumar, R.S. Therapeutic effects of gold nanoparticles synthesized using Musa paradisiaca peel extract against multiple antibiotic resistant Enterococcus faecalis biofilms and human lung cancer cells (A549). Microb. Pathog. 2017, 102, 173–183. [Google Scholar] [CrossRef] [PubMed]
- Bjarnsholt, T.; Buhlin, K.; Dufrêne, Y.F.; Gomelsky, M.; Moroni, A.; Ramstedt, M.; Rumbaugh, K.P.; Schulte, T.; Sun, L.; Åkerlund, B. Biofilm formation–what we can learn from recent developments. J. Intern. Med. 2018, 284, 332–345. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Ding, Y.; Cohen, Y.; Cao, B. Elevated level of the second messenger c-di-GMP in Comamonas testosteroni enhances biofilm formation and biofilm-based biodegradation of 3-chloroaniline. Appl. Microbiol. Biotechnol. 2015, 99, 1967–1976. [Google Scholar] [CrossRef] [PubMed]
- Ding, Y.; Peng, N.; Du, Y.; Ji, L.; Cao, B. Disruption of putrescine biosynthesis in Shewanella oneidensis enhances biofilm cohesiveness and performance in Cr (VI) immobilization. Appl. Environ. Microbiol. 2014, 80, 1498–1506. [Google Scholar] [CrossRef] [PubMed]
- Ding, Y.; Zhou, Y.; Yao, J.; Szymanski, C.; Fredrickson, J.; Shi, L.; Cao, B.; Zhu, Z.; Yu, X.-Y. In situ molecular imaging of the biofilm and its matrix. Anal. Chem. 2016, 88, 11244–11252. [Google Scholar] [CrossRef] [PubMed]
- Ding, Y.; Zhou, Y.; Yao, J.; Xiong, Y.; Zhu, Z.; Yu, X.-Y. Molecular evidence of a toxic effect on a biofilm and its matrix. Analyst 2019, 144, 2498–2503. [Google Scholar] [CrossRef] [PubMed]
- Flemming, H.-C.; Wingender, J. The biofilm matrix. Nat. Rev. Microbiol. 2010, 8, 623–633. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Ding, Y.; Hu, Y.; Cao, B.; Rice, S.A.; Kjelleberg, S.; Song, H. Enhancing bidirectional electron transfer of Shewanella oneidensis by a synthetic flavin pathway. ACS Synth. Biol. 2015, 4, 815–823. [Google Scholar] [CrossRef]
- Zhao, C.-e.; Chen, J.; Ding, Y.; Wang, V.B.; Bao, B.; Kjelleberg, S.; Cao, B.; Loo, S.C.J.; Wang, L.; Huang, W. Chemically functionalized conjugated oligoelectrolyte nanoparticles for enhancement of current generation in microbial fuel cells. ACS Appl. Mater. Interfaces 2015, 7, 14501–14505. [Google Scholar] [CrossRef]
- Zhao, C.e.; Wu, J.; Ding, Y.; Wang, V.B.; Zhang, Y.; Kjelleberg, S.; Loo, J.S.C.; Cao, B.; Zhang, Q. Hybrid conducting biofilm with built-in bacteria for high-performance microbial fuel cells. ChemElectroChem 2015, 2, 654–658. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, D.; Ding, Y.; Wang, S. Mechanical performance of strain-hardening cementitious composites (SHCC) with bacterial addition. J. Infrastruct. Preserv. Resil. 2022, 3, 3. [Google Scholar] [CrossRef]
- Zhang, Z.; Weng, Y.; Ding, Y.; Qian, S. Use of genetically modified bacteria to repair cracks in concrete. Materials 2019, 12, 3912. [Google Scholar] [CrossRef]
- Hamdany, A.H.; Ding, Y.; Qian, S. Visible light antibacterial potential of graphene-TiO2 cementitious composites for self-sterilization surface. J. Sustain. Cem.-Based Mater. 2022, 12, 972–982. [Google Scholar] [CrossRef]
- Hamdany, A.H.; Ding, Y.; Qian, S. Cementitious Composite Materials for Self-Sterilization Surfaces. ACI Mater. J. 2022, 119, 197–210. [Google Scholar] [CrossRef]
- Hamdany, A.H.; Ding, Y.; Qian, S. Mechanical and antibacterial behavior of photocatalytic lightweight engineered cementitious composites. J. Mater. Civ. Eng. 2021, 33, 04021262. [Google Scholar] [CrossRef]
- Hamdany, A.H.; Ding, Y.; Qian, S. Graphene-Based TiO2 Cement Composites to Enhance the Antibacterial Effect of Self-Disinfecting Surfaces. Catalysts 2023, 13, 1313. [Google Scholar] [CrossRef]
- Taninaga, J.; Nishiyama, Y.; Fujibayashi, K.; Gunji, T.; Sasabe, N.; Iijima, K.; Naito, T. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study. Sci. Rep. 2019, 9, 12384. [Google Scholar] [CrossRef]
- Al-Jarrah, O.Y.; Yoo, P.D.; Muhaidat, S.; Karagiannidis, G.K.; Taha, K. Efficient machine learning for big data: A review. Big Data Res. 2015, 2, 87–93. [Google Scholar] [CrossRef]
- Bzdok, D.; Krzywinski, M.; Altman, N. Machine learning: A primer. Nat. Methods 2017, 14, 1119. [Google Scholar] [CrossRef]
- Zhao, M.; Tang, Y.; Kim, H.; Hasegawa, K. Machine learning with k-means dimensional reduction for predicting survival outcomes in patients with breast cancer. Cancer Inform. 2018, 17, 1176935118810215. [Google Scholar] [CrossRef] [PubMed]
- Moitra, D.; Mandal, R.K. Automated grading of non-small cell lung cancer by fuzzy rough nearest neighbour method. Netw. Model. Anal. Health Inform. Bioinform. 2019, 8, 24. [Google Scholar] [CrossRef]
- Jessica, E.O.; Hamada, M.; Yusuf, S.I.; Hassan, M. The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer. In Proceedings of the 2021 IEEE 14th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), Singapore, 20–23 December 2021; pp. 340–344. [Google Scholar]
- Nguyen, T.; Khosravi, A.; Creighton, D.; Nahavandi, S. Hidden Markov models for cancer classification using gene expression profiles. Inf. Sci. 2015, 316, 293–307. [Google Scholar] [CrossRef]
- Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef]
- Naji, M.A.; El Filali, S.; Aarika, K.; Benlahmar, E.L.H.; Abdelouhahid, R.A.; Debauche, O. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci. 2021, 191, 487–492. [Google Scholar] [CrossRef]
- Ghaderzadeh, M.; Asadi, F.; Ramezan Ghorbani, N.; Almasi, S.; Taami, T. Toward artificial intelligence (AI) applications in the determination of COVID-19 infection severity: Considering AI as a disease control strategy in future pandemics. Iran. J. Blood Cancer 2023, 15, 93–111. [Google Scholar] [CrossRef]
Breast Cancer Prediction Model | ML Model | |
---|---|---|
Rationales | Use of mathematic formula to predict the cancer | Prediction of cancer via the ML algorithm |
Methods | Use of data to build logic links, connecting factors (e.g., age, height, BMI) and cancer | Prediction via the “black box” without considering the logic links |
Accuracy of prediction | Assumptions and connections | Quality and quantity of data |
Advantages | Matured methods with clear process | Convenient and fact prediction |
Limitations | Incorrect assumption and researchers’ bias | “Black swan” effect |
Reference | [6,7] | [8,9,10,11,12,13] |
Age | Approximate Number | Site | Assigned Number | Year | Assigned Number |
≤1 | 1 | Testis | 1 | 2019 | 1 |
1–4 | 3 | Hodgkin Lymphoma | 2 | 2018 | 2 |
5–9 | 7 | Thyroid | 3 | 2017 | 3 |
10–14 | 12 | Mesothelioma | 4 | 2016 | 4 |
15–19 | 17 | Cervix | 5 | 2015 | 5 |
20–24 | 22 | Brain and Other Nervous System | 6 | 2014 | 6 |
25–29 | 27 | Larynx | 7 | 2013 | 7 |
30–34 | 32 | Melanomas of the Skin | 8 | 2012 | 8 |
35–39 | 37 | Oral Cavity and Pharynx | 9 | 2011 | 9 |
40–44 | 42 | Kidney and Renal Pelvis | 10 | 2010 | 10 |
45–49 | 47 | Leukemias | 11 | 2009 | 11 |
50–54 | 52 | Esophagus | 12 | 2008 | 12 |
55–59 | 57 | Corpus and Uterus, NOS | 13 | 2007 | 13 |
60–64 | 62 | Myeloma | 14 | 2006 | 14 |
65–69 | 67 | Ovary | 15 | 2005 | 15 |
70–74 | 72 | Non-Hodgkin Lymphoma | 16 | 2004 | 16 |
75–79 | 77 | Stomach | 17 | 2003 | 17 |
80–84 | 82 | Urinary Bladder | 18 | 2002 | 18 |
85+ | 87 | Liver and Intrahepatic Bile Duct | 19 | 2001 | 19 |
Pancreas | 20 | 2000 | 20 | ||
Female Breast | 21 | 1999 | 21 | ||
Colon and Rectum | 22 | ||||
Lung and Bronchus | 23 | ||||
Prostate | 24 | ||||
Gender | Assigned Number | Race | Approximate Number | Event Type | Assigned Number |
Female | 1 | Non-Hispanic White | 1 | Incidence | 1 |
Male | 2 | Non-Hispanic Asian/Pacific Islander | 2 | Mortality | 2 |
Non-Hispanic American Indian/Alaska Native | 3 | ||||
Non-Hispanic Black | 4 | ||||
Hispanic of any race | 5 | ||||
CIlower/CIupper | Approximate Number | Incidence/Mortality Rate | Assigned Number | ||
[0–0.5) | 0 | [0–5) | 0 | ||
[0.5–1.5) | 1 | [5–15) | 10 | ||
[1.5–2.5) | 2 | [15–25) | 20 | ||
[2.5–3.5) | 3 | [25–35) | 30 | ||
… | … | … | … |
Incidence rate prediction | Method | Principle | Testing Accuracy | Precision |
Decision tree | Decision-making; nodes are features, leaves are classes | 57.53% | 58.02% | |
Random forest | Ensemble of trees; each trained on random data subsets | 57.90% | 57.31% | |
Logistic regression | Linear model for binary classification | 50.11% | 45.72% | |
SVC | Finding optimal hyperplane; maximizing margin | 49.99% | 46.83% | |
Neural network | Brain-inspired models with layered neurons | 58.92% | 58.21% | |
Mortality rate prediction | Method | Principle | Testing Accuracy | Precision |
Decision tree | Decision-making; nodes are features, leaves are classes | 62.17% | 59.18% | |
Random forest | Ensemble of trees; each trained on random data subsets | 61.92% | 57.71% | |
Logistic regression | Linear model for binary classification | 54.53% | 48.36% | |
SVC | Finding optimal hyperplane; maximizing margin | 55.72% | 50.89% | |
Neural network | Brain-inspired models with layered neurons | 62.30% | 59.32% |
Sites | Reported Incidence Rate per 100,000 | Reported Death Rate per 100,000 |
---|---|---|
Testis | 6.109 | 0.416 |
Hodgkin Lymphoma | 3.357 | 1.052 |
Thyroid | 15.838 | 2.675 |
Mesothelioma | 5.771 | 5.176 |
Cervix | 14.571 | 6.23 |
Brain and Other Nervous System | 7.758 | 7.599 |
Larynx | 14.946 | 7.642 |
Melanomas of the Skin | 26.591 | 8.439 |
Oral Cavity and Pharynx | 21.866 | 9.773 |
Kidney and Renal Pelvis | 34.078 | 15.238 |
Leukemias | 20.492 | 15.742 |
Esophagus | 16.881 | 16.278 |
Corpus and Uterus, NOS | 46.417 | 16.556 |
Myeloma | 25.058 | 18.811 |
Ovary | 20.202 | 19.706 |
Non-Hodgkin Lymphoma | 33.855 | 19.849 |
Stomach | 31.179 | 21.067 |
Urinary Bladder | 58.880 | 23.543 |
Liver and Intrahepatic Bile Duct | 31.211 | 28.477 |
Pancreas | 40.685 | 40.167 |
Female Breast | 137.875 | 51.988 |
Colon and Rectum | 113.632 | 55.407 |
Lung and Bronchus | 169.056 | 139.842 |
Prostate | 429.041 | 169.678 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, Y. Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality. Diseases 2024, 12, 139. https://doi.org/10.3390/diseases12070139
Ding Y. Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality. Diseases. 2024; 12(7):139. https://doi.org/10.3390/diseases12070139
Chicago/Turabian StyleDing, Yuanzhao. 2024. "Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality" Diseases 12, no. 7: 139. https://doi.org/10.3390/diseases12070139
APA StyleDing, Y. (2024). Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality. Diseases, 12(7), 139. https://doi.org/10.3390/diseases12070139