Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data
Simple Summary
Abstract
1. Introduction
2. Material and Methods
2.1. Bioinformatic Processing of Microarray-Based Gene Expression Data
2.2. Data Preprocessing
2.3. Differentially Expressed Gene (DEG) Analysis
2.4. Principal Component Analysis (PCA)
2.5. Biostatistical Analysis
2.6. Machine Learning
2.7. Prediction Model
3. Results
3.1. Global Gene Expression Variation and Subgroup Clustering Identified by PCA
3.2. Gene Expression Variations Among Geriatric, Postmenopausal, and Premenopausal Luminal A Patients
3.3. Evaluation of Target Gene Expression in Relation to Age Groups and Clinical Outcomes
3.4. Statistical Analysis of Survival Status Across Geriatric, Premenopausal, and Postmenopausal Patients
3.5. Machine Learning-Based and Ensemble Classification of Breast Cancer Survival Outcomes
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Testa, U.; Castelli, G.; Pelosi, E. Breast Cancer: A Molecularly Heterogenous Disease Needing Subtype-Specific Treatments. Med. Sci. 2020, 8, 18. [Google Scholar] [CrossRef] [PubMed]
- Van De Vijver, M.J.; He, Y.D.; Van’t Veer, L.J.; Dai, H.; Hart, A.A.; Voskuil, D.W.; Schreiber, G.J.; Peterse, J.L.; Roberts, C.; Martonm, M.J.; et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 2002, 347, 1999–2009. [Google Scholar] [CrossRef]
- Parker, J.S.; Mullins, M.; Cheang, M.C.; Leung, S.; Voduc, D.; Vickery, T.; Davies, S.; Fauron, C.; He, X.; Hu, Z.; et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 2009, 27, 1160–1167. [Google Scholar] [CrossRef]
- Nielsen, T.O.; Parker, J.S.; Leung, S.; Voduc, D.; Ebbert, M.; Vickery, T.; Davies, S.R.; Snider, J.; Stijleman, I.J.; Reed, J.; et al. A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin. Cancer Res. 2010, 16, 5222–5232. [Google Scholar] [CrossRef] [PubMed]
- Curtis, C.; Shah, S.P.; Chin, S.F.; Turashvili, G.; Rueda, O.M.; Dunning, M.J.; Speed, D.; Lynch, A.G.; Samarajiwa, S.; Yuan, Y.; et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012, 486, 346–352. [Google Scholar] [CrossRef] [PubMed]
- Chang, L.; Weiner, L.S.; Hartman, S.J.; Horvath, S.; Jeste, D.; Mischel, P.S.; Kado, D.M. Breast cancer treatment and its effects on aging. J. Geriatr. Oncol. 2019, 10, 346–355. [Google Scholar] [CrossRef]
- Nishimura, R.; Osako, T.; Okumura, Y.; Nakano, M.; Otsuka, H.; Fujisue, M.; Arima, N. Triple Negative Breast Cancer: An Analysis of the Subtypes and the Effects of Menopausal Status on Invasive Breast Cancer. J. Clin. Med. 2022, 11, 2331. [Google Scholar] [CrossRef]
- Jiang, X.; Shen, H.; Shang, X.; Fang, J.; Lu, Y.; Lu, Y.; Zheng, J.; Fu, P. Recent Advances in the Aging Microenvironment of Breast Cancer. Cancers 2022, 14, 4990. [Google Scholar] [CrossRef]
- Hickey, M.; Basu, P.; Sassarini, J.; Stegmann, M.E.; Weiderpass, E.; Nakawala Chilowa, K.; Yip, C.H.; Partridge, A.H.; Brennan, D.J. Managing menopause after cancer. Lancet 2024, 403, 984–996. [Google Scholar] [CrossRef]
- Hu, C.; Hart, S.N.; Gnanaolivu, R.; Huang, H.; Lee, K.Y.; Na, J.; Gao, C.; Lilyquist, J.; Yadav, S.; Boddicker, N.J.; et al. A Population-Based Study of Genes Previously Implicated in Breast Cancer. N. Engl. J. Med. 2021, 384, 440–451. [Google Scholar] [CrossRef]
- Conner, S.J.; Borges, H.B.; Guarin, J.R.; Gerton, T.J.; Yui, A.; Salhany, K.J., Jr.; Mensah, D.N.; Hamilton, G.A.; Le, G.H.; Lew, K.C.; et al. Obesity Induces Temporally Regulated Alterations in the Extracellular Matrix That Drive Breast Tumor Invasion and Metastasis. Cancer Res. 2024, 84, 2761–2775. [Google Scholar] [CrossRef]
- Okano, M.; Oshi, M.; Mukhopadhyay, S.; Qi, Q.; Yan, L.; Endo, I.; Ohtake, T.; Takabe, K. Octogenarians’ Breast Cancer Is Associated with an Unfavorable Tumor Immune Microenvironment and Worse Disease-Free Survival. Cancers 2021, 13, 2933. [Google Scholar] [CrossRef]
- Fane, M.; Weeraratna, A.T. How the ageing microenvironment influences tumour progression. Nat. Rev. Cancer 2020, 20, 89–106. [Google Scholar] [CrossRef]
- Drijvers, J.M.; Sharpe, A.H.; Haigis, M.C. The effects of age and systemic metabolism on anti-tumor T cell responses. eLife 2020, 9, e62420. [Google Scholar] [CrossRef] [PubMed]
- Ontiveros, C.O.; Murray, C.E.; Crossland, G.; Curiel, T.J. Considerations and Approaches for Cancer Immunotherapy in the Aging Host. Cancer Immunol. Res. 2023, 11, 1449–1461. [Google Scholar] [CrossRef]
- Pal, S.K.; Hurria, A. Impact of age, sex, and comorbidity on cancer therapy and disease progression. J. Clin. Oncol. 2010, 28, 4086–4093. [Google Scholar] [CrossRef] [PubMed]
- de Kruijf, E.M.; Bastiaannet, E.; Ruberta, F.; de Craen, A.J.; Kuppen, P.J.; Smit, V.T.; van de Velde, C.J.; Liefers, G.J. Comparison of frequencies and prognostic effect of molecular subtypes between young and elderly breast cancer patients. Mol. Oncol. 2014, 8, 1014–1025. [Google Scholar] [CrossRef]
- Syed, B.M.; Green, A.R.; Rakha, E.A.; Morgan, D.A.L.; Ellis, I.O.; Cheung, K.L. Age-Related Biology of Early-Stage Operable Breast Cancer and Its Impact on Clinical Outcome. Cancers 2021, 13, 1417. [Google Scholar] [CrossRef] [PubMed]
- Lee, M.K.; Varzi, L.A.; Chung, D.U.; Cao, M.A.; Gornbein, J.; Apple, S.K.; Chang, H.R. The Effect of Young Age in Hormone Receptor Positive Breast Cancer. Biomed. Res. Int. 2015, 2015, 325715. [Google Scholar] [CrossRef]
- Koleckova, M.; Kolar, Z.; Ehrmann, J.; Korinkova, G.; Trojanec, R. Age-associated prognostic and predictive biomarkers in patients with breast cancer. Oncol. Lett. 2017, 13, 4201–4207. [Google Scholar] [CrossRef]
- Su, Y.; Zheng, Y.; Zheng, W.; Gu, K.; Chen, Z.; Li, G.; Cai, Q.; Lu, W.; Shu, X.O. Distinct distribution and prognostic significance of molecular subtypes of breast cancer in Chinese women: A population-based cohort study. BMC Cancer 2011, 11, 292. [Google Scholar] [CrossRef] [PubMed]
- Rueda, O.M.; Sammut, S.J.; Seoane, J.A.; Chin, S.F.; Caswell-Jin, J.L.; Callari, M.; Batra, R.; Pereira, B.; Bruna, A.; Ali, H.R.; et al. Dynamics of breast-cancer relapse reveal late-recurring ER-positive genomic subgroups. Nature 2019, 567, 399–404. [Google Scholar] [CrossRef]
- Mao, X.; Omeogu, C.; Karanth, S.; Joshi, A.; Meernik, C.; Wilson, L.; Clark, A.; Deveaux, A.; He, C.; Johnson, T.; et al. Association of reproductive risk factors and breast cancer molecular subtypes: A systematic review and meta-analysis. BMC Cancer 2023, 23, 644. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Jia, X.; Zhao, J.; Peng, Y.; Yao, X.; Hu, X.; Cui, J.; Chen, H.; Chen, X.; Wu, J.; et al. A Machine Learning-Based Unenhanced Radiomics Approach to Distinguishing Between Benign and Malignant Breast Lesions Using T2-Weighted and Diffusion-Weighted MRI. J. Magn. Reson. Imaging 2024, 60, 600–612. [Google Scholar] [CrossRef]
- Chauhan, N.R.; Sengar, A.; Shukla, R.; Kumar, S. Multiclass Predictor and Diagnoses of Breast Cancer Using Machine Learning. In Proceedings of the 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 22–23 December 2023. [Google Scholar]
- Arabi, H.; AkhavanAllaf, A.; Sanaat, A.; Shiri, I.; Zaidi, H. The promise of artificial intelligence and deep learning in PET and SPECT imaging. Phys. Med. 2021, 83, 122–137. [Google Scholar] [CrossRef]
- Arabi, H.; Zaidi, H. Applications of artificial intelligence and deep learning in molecular imaging and radiotherapy. Eur. J. Hybrid. Imaging 2020, 4, 17. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, Z.; Mohamed, K.; Zeeshan, S.; Dong, X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database 2020, 2020, baaa010. [Google Scholar] [CrossRef]
- Karim, S.; Iqbal, M.S.; Ahmad, N.; Ansari, M.S.; Mirza, Z.; Merdad, A.; Jastaniah, S.D.; Kumar, S. Gene expression study of breast cancer using Welch Satterthwaite t-test, Kaplan-Meier estimator plot and Huber loss robust regression model. J. King Saud. Univ. Sci. 2023, 35, 102447. [Google Scholar] [CrossRef]
- Nalkiran, I.; Sevim Nalkiran, H.; Ozcelik, N.; Kivrak, M. In Silico Identification of LSD1 Inhibition-Responsive Targets in Small Cell Lung Cancer. Bioengineering 2025, 12, 504. [Google Scholar] [CrossRef]
- Kivrak, M.; Nalkiran, I.; Sevim Nalkiran, H. Exploring the Therapeutic Potential of the DOT1L Inhibitor EPZ004777 Using Bioinformatics and Molecular Docking Approaches in Acute Myeloid Leukemia. Curr. Issues Mol. Biol. 2025, 47, 173. [Google Scholar] [CrossRef]
- Shinde, P.P.; Shah, S. A Review of Machine Learning and Deep Learning Applications. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 16–18 August 2018. [Google Scholar]
- Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. FrontMatter, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013. [Google Scholar]
- Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Pearson Education, Inc.: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin 2024, 74, 229–263. [Google Scholar] [CrossRef] [PubMed]
- Harbeck, N.; Penault-Llorca, F.; Cortes, J.; Gnant, M.; Houssami, N.; Poortmans, P.; Ruddy, K.; Tsang, J.; Cardoso, F. Breast cancer. Nat. Rev. Dis. Primers 2019, 5, 66. [Google Scholar] [CrossRef] [PubMed]
- Loibl, S.; Poortmans, P.; Morrow, M.; Denkert, C.; Curigliano, G. Breast cancer. Lancet 2021, 397, 1750–1769. [Google Scholar] [CrossRef]
- de la Rochefordière, A.; Campana, F.; Fenton, J.; Vilcoq, J.R.; Fourquet, A.; Asselain, B.; Scholl, S.M.; Pouillart, P.; Durand, J.C.; Magdelenat, H. Age as prognostic factor in premenopausal breast carcinoma. Lancet 1993, 341, 1039–1043. [Google Scholar] [CrossRef]
- Chung, M.; Chang, H.R.; Bland, K.I.; Wanebo, H.J. Younger women with breast carcinoma have a poorer prognosis than older women. Cancer 1996, 77, 97–103. [Google Scholar] [CrossRef]
- Fredholm, H.; Eaker, S.; Frisell, J.; Holmberg, L.; Fredriksson, I.; Lindman, H. Breast cancer in young women: Poor survival despite intensive treatment. PLoS ONE 2009, 4, e7695. [Google Scholar] [CrossRef] [PubMed]
- Varga, D.; Wischnewsky, M.; Atassi, Z.; Wolters, R.; Geyer, V.; Strunz, K.; Kreienberg, R.; Woeckel, A. Does Guideline-Adherent Therapy Improve the Outcome for Early-Onset Breast Cancer Patients? Oncology 2010, 78, 189–195. [Google Scholar] [CrossRef]
- Yadav, B.S.; Das, D.; Bansal, A.; Dahiya, D. Hypofractionated radiotherapy in young versus older women with breast cancer: A retrospective study from India. Rep. Pract. Oncol. Radiother. 2022, 27, 281–290. [Google Scholar] [CrossRef]
- Li, Y.; Li, D.; Tao, X.; Ye, Y.; Zhang, C.; Xu, Z.; Liu, Z.; Wang, M.; Liu, Z.; Li, Z.; et al. Integrated Prognostic Model for Young Breast Cancer Patients: Insights from SEER, METABRIC, and TCGA Databases. Clin. Breast Cancer 2025. Online ahead of print. [Google Scholar]
- Dowsett, M.; Sestak, I.; Lopez-Knowles, E.; Sidhu, K.; Dunbier, A.K.; Cowens, J.W.; Ferree, S.; Storhoff, J.; Schaper, C.; Cuzick, J. Comparison of PAM50 risk of recurrence score with oncotype DX and IHC4 for predicting risk of distant recurrence after endocrine therapy. J. Clin. Oncol. 2013, 31, 2783–2790. [Google Scholar] [CrossRef]
- Pu, M.; Messer, K.; Davies, S.R.; Vickery, T.L.; Pittman, E.; Parker, B.A.; Ellis, M.J.; Flatt, S.W.; Marinac, C.R.; Nelson, S.H.; et al. Research-based PAM50 signature and long-term breast cancer survival. Breast Cancer Res. Treat. 2020, 179, 197–206. [Google Scholar] [CrossRef] [PubMed]
- Lundgren, C.; Bendahl, P.O.; Church, S.E.; Ekholm, M.; Fernö, M.; Forsare, C.; Krüger, U.; Nordenskjöld, B.; Stål, O.; Rydén, L. PAM50 subtyping and ROR score add long-term prognostic information in premenopausal breast cancer patients. npj Breast Cancer 2022, 8, 61. [Google Scholar] [CrossRef]
- Sparano, J.A.; Gray, R.J.; Ravdin, P.M.; Makower, D.F.; Pritchard, K.I.; Albain, K.S.; Hayes, D.F.; Geyer, C.E., Jr.; Dees, E.C.; Goetz, M.P.; et al. Clinical and Genomic Risk to Guide the Use of Adjuvant Therapy for Breast Cancer. N. Engl. J. Med. 2019, 380, 2395–2405. [Google Scholar] [CrossRef]
- Kalinsky, K.; Barlow, W.E.; Gralow, J.R.; Meric-Bernstam, F.; Albain, K.S.; Hayes, D.F.; Lin, N.U.; Perez, E.A.; Goldstein, L.J.; Chia, S.K.L.; et al. 21-Gene Assay to Inform Chemotherapy Benefit in Node-Positive Breast Cancer. N. Engl. J. Med. 2021, 385, 2336–2347. [Google Scholar] [CrossRef]
- Sestak, I.; Cuzick, J.; Dowsett, M.; Lopez-Knowles, E.; Filipits, M.; Dubsky, P.; Cowens, J.W.; Ferree, S.; Schaper, C.; Fesl, C.; et al. Prediction of late distant recurrence after 5 years of endocrine treatment: A combined analysis of patients from the Austrian breast and colorectal cancer study group 8 and arimidex, tamoxifen alone or in combination randomized trials using the PAM50 risk of recurrence score. J. Clin. Oncol. 2015, 33, 916–922. [Google Scholar] [CrossRef] [PubMed]
- Andre, F.; Ismaila, N.; Allison, K.H.; Barlow, W.E.; Collyar, D.E.; Damodaran, S.; Henry, N.L.; Jhaveri, K.; Kalinsky, K.; Kuderer, N.M.; et al. Biomarkers for Adjuvant Endocrine and Chemotherapy in Early-Stage Breast Cancer: ASCO Guideline Update. J. Clin. Oncol. 2022, 40, 1816–1837. [Google Scholar] [CrossRef] [PubMed]
- Gogineni, K.; Kalinsky, K. Individualizing Adjuvant Therapy in Women With Hormone Receptor-Positive, Human Epidermal Growth Factor Receptor 2-Negative Node-Positive Breast Cancer. JCO Oncol. Pract. 2022, 18, 247–251. [Google Scholar] [CrossRef]
- Morganti, S.; Marra, A.; Crimini, E.; D’Amico, P.; Zagami, P.; Curigliano, G. Refining risk stratification in HR-positive/HER2-negative early breast cancer: How to select patients for treatment escalation? Breast Cancer Res. Treat. 2022, 192, 465–484. [Google Scholar] [CrossRef]
- Kizy, S.; Altman, A.M.; Marmor, S.; Denbo, J.W.; Jensen, E.H.; Tuttle, T.M.; Hui, J.Y.C. 21-gene recurrence score testing in the older population with estrogen receptor-positive breast cancer. J. Geriatr. Oncol. 2019, 10, 322–329. [Google Scholar] [CrossRef]
- Shak, S.; Miller, D.; Howlader, N.; Gliner, N.; Howe, W.; Schussler, N.; Cronin, K.; Baehner, F.; Penberthy, L.; Petkov, V. Outcome disparities by age and 21-gene recurrence score®(RS) result in hormone receptor positive (HR+) breast cancer (BC). Ann. Oncol. 2016, 27, vi43. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Guo, H.; Li, Y.; Shang, J.; Gu, M.; Huang, Y.; Gong, B. Learning from class-imbalanced data: Review of methods and applications. Expert. Syst. Appl. 2017, 73, 220–239. [Google Scholar]
- Ali, A.; Shamsuddin, S.M.H.; Ralescu, A.L. Classification with class imbalance problem: A review. Int. J. Advance Soft Compu. Appl. 2015, 7, 176–204. [Google Scholar]
- Kivrak, M.; Avci, U.; Uzun, H.; Ardic, C. The Impact of the SMOTE Method on Machine Learning and Ensemble Learning Performance Results in Addressing Class Imbalance in Data Used for Predicting Total Testosterone Deficiency in Type 2 Diabetes Patients. Diagnostics 2024, 14, 2634. [Google Scholar] [CrossRef]
- Cruz, J.A.; Wishart, D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2007, 2, 59–77. [Google Scholar] [CrossRef]
- Ching, T.; Himmelstein, D.S.; Beaulieu-Jones, B.K.; Kalinin, A.A.; Do, B.T.; Way, G.P.; Ferrero, E.; Agapow, P.M.; Zietz, M.; Hoffman, M.M.; et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 2018, 15, 20170387. [Google Scholar] [CrossRef]
- Díaz-Uriarte, R.; Alvarez de Andrés, S. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3. [Google Scholar] [CrossRef] [PubMed]
- Gardner, M.W.; Dorling, S.R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmos. Environ. 1998, 32, 2627–2636. [Google Scholar] [CrossRef]
- Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef]
- Lundberg, S.; Erion, G.; Lee, S.-I. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv 2018. [Google Scholar] [CrossRef]
- Bekker-Jensen, S.; Rendtlew Danielsen, J.; Fugger, K.; Gromova, I.; Nerstedt, A.; Lukas, C.; Bartek, J.; Lukas, J.; Mailand, N. HERC2 coordinates ubiquitin-dependent assembly of DNA repair factors on damaged chromosomes. Nat. Cell Biol. 2010, 12, 80–86. [Google Scholar] [CrossRef] [PubMed]
- Kang, T.H.; Lindsey-Boltz, L.A.; Reardon, J.T.; Sancar, A. Circadian control of XPA and excision repair of cisplatin-DNA damage by cryptochrome and HERC2 ubiquitin ligase. Proc. Natl. Acad. Sci. USA 2010, 107, 4890–4895. [Google Scholar] [CrossRef]
- Wu, W.; Sato, K.; Koike, A.; Nishikawa, H.; Koizumi, H.; Venkitaraman, A.R.; Ohta, T. HERC2 Is an E3 Ligase That Targets BRCA1 for Degradation. Cancer Res. 2010, 70, 6384–6392. [Google Scholar] [CrossRef]
- Rondeau, S.; Vacher, S.; De Koning, L.; Briaux, A.; Schnitzler, A.; Chemlali, W.; Callens, C.; Lidereau, R.; Bièche, I. ATM has a major role in the double-strand break repair pathway dysregulation in sporadic breast carcinomas and is an independent prognostic marker at both mRNA and protein levels. Br. J. Cancer 2015, 112, 1059–1066. [Google Scholar] [CrossRef]
- Ye, C.; Cai, Q.; Dai, Q.; Shu, X.O.; Shin, A.; Gao, Y.T.; Zheng, W. Expression patterns of the ATM gene in mammary tissues and their associations with breast cancer survival. Cancer 2007, 109, 1729–1735. [Google Scholar] [CrossRef]
- Tommiska, J.; Bartkova, J.; Heinonen, M.; Hautala, L.; Kilpivaara, O.; Eerola, H.; Aittomäki, K.; Hofstetter, B.; Lukas, J.; von Smitten, K.; et al. The DNA damage signalling kinase ATM is aberrantly reduced or lost in BRCA1/BRCA2-deficient and ER/PR/ERBB2-triple-negative breast cancer. Oncogene 2008, 27, 2501–2506. [Google Scholar] [CrossRef] [PubMed]
- Salimi, M.; Mozdarani, H.; Majidzadeh, K. Expression pattern of ATM and cyclin D1 in ductal carcinoma, normal adjacent and normal breast tissues of Iranian breast cancer patients. Med. Oncol. 2012, 29, 1502–1509. [Google Scholar] [CrossRef] [PubMed]
- Bueno, R.C.; Canevari, R.A.; Villacis, R.A.; Domingues, M.A.; Caldeira, J.R.; Rocha, R.M.; Drigo, S.A.; Rogatto, S.R. ATM down-regulation is associated with poor prognosis in sporadic breast carcinomas. Ann. Oncol. 2014, 25, 69–75. [Google Scholar] [CrossRef]
- Guo, X.; Yang, C.; Qian, X.; Lei, T.; Li, Y.; Shen, H.; Fu, L.; Xu, B. Estrogen receptor α regulates ATM Expression through miRNAs in breast cancer. Clin. Cancer Res. 2013, 19, 4994–5002. [Google Scholar] [CrossRef]
- Lingyu, L.; Yousif, A.A.; Zhi-Ping, L. Identifying Diagnostic Biomarkers of Breast Cancer Based on Gene Expression Data and Ensemble Feature Selection. Curr. Bioinform. 2023, 18, 232–246. [Google Scholar] [CrossRef]
- Li, L.; Liu, Z.P. Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models. J. Transl. Med. 2021, 19, 514. [Google Scholar] [CrossRef]
- Morrison Joly, M.; Hicks, D.J.; Jones, B.; Sanchez, V.; Estrada, M.V.; Young, C.; Williams, M.; Rexer, B.N.; Sarbassov, D.D.; Muller, W.J.; et al. Rictor/mTORC2 Drives Progression and Therapeutic Resistance of HER2-Amplified Breast Cancers. Cancer Res. 2016, 76, 4752–4764. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Zhang, X.; Li, M.; Chen, P.; Zhang, B.; Guo, H.; Cao, W.; Wei, X.; Cao, X.; Hao, X.; et al. mTOR complex component Rictor interacts with PKCzeta and regulates cancer cell metastasis. Cancer Res. 2010, 70, 9360–9370. [Google Scholar] [CrossRef]
- Wazir, U.; Newbold, R.F.; Jiang, W.G.; Sharma, A.K.; Mokbel, K. Prognostic and therapeutic implications of mTORC1 and Rictor expression in human breast cancer. Oncol. Rep. 2013, 29, 1969–1974. [Google Scholar] [CrossRef]
- Sun, M.; Paciga, J.E.; Feldman, R.I.; Yuan, Z.; Coppola, D.; Lu, Y.Y.; Shelley, S.A.; Nicosia, S.V.; Cheng, J.Q. Phosphatidylinositol-3-OH Kinase (PI3K)/AKT2, activated in breast cancer, regulates and is induced by estrogen receptor alpha (ERalpha) via interaction between ERalpha and PI3K. Cancer Res. 2001, 61, 5985–5991. [Google Scholar]
- Nitulescu, G.M.; Margina, D.; Juzenas, P.; Peng, Q.; Olaru, O.T.; Saloustros, E.; Fenga, C.; Spandidos, D.; Libra, M.; Tsatsakis, A.M. Akt inhibitors in cancer treatment: The long journey from drug discovery to clinical use (Review). Int. J. Oncol. 2016, 48, 869–885. [Google Scholar] [CrossRef] [PubMed]
- Javaid, S.; Zhang, J.; Smolen, G.A.; Yu, M.; Wittner, B.S.; Singh, A.; Arora, K.S.; Madden, M.W.; Desai, R.; Zubrowski, M.J.; et al. MAPK7 Regulates EMT Features and Modulates the Generation of CTCs. Mol. Cancer Res. 2015, 13, 934–943. [Google Scholar] [CrossRef]
- Cronan, M.R.; Nakamura, K.; Johnson, N.L.; Granger, D.A.; Cuevas, B.D.; Wang, J.G.; Mackman, N.; Scott, J.E.; Dohlman, H.G.; Johnson, G.L. Defining MAP3 kinases required for MDA-MB-231 cell tumor growth and metastasis. Oncogene 2012, 31, 3889–3900. [Google Scholar] [CrossRef] [PubMed]
- Antoon, J.W.; Martin, E.C.; Lai, R.; Salvo, V.A.; Tang, Y.; Nitzchke, A.M.; Elliott, S.; Nam, S.Y.; Xiong, W.; Rhodes, L.V.; et al. MEK5/ERK5 signaling suppresses estrogen receptor expression and promotes hormone-independent tumorigenesis. PLoS ONE 2013, 8, e69291. [Google Scholar] [CrossRef]
- Montero, J.C.; Ocaña, A.; Abad, M.; Ortiz-Ruiz, M.J.; Pandiella, A.; Esparís-Ogando, A. Expression of Erk5 in early stage breast cancer and association with disease free survival identifies this kinase as a potential therapeutic target. PLoS ONE 2009, 4, e5565. [Google Scholar] [CrossRef]
- Kandil, N.S.; Kandil, L.S.; Mohamed, R.; Selima, M.; El Nemr, M.; Barakat, A.R.; Alwany, Y.N. The Role of miRNA-182 and FOXO3 Expression in Breast Cancer. Asian Pac. J. Cancer Prev. 2022, 23, 3361–3370. [Google Scholar] [CrossRef]
- Yin, Z.; Wang, W.; Qu, G.; Wang, L.; Wang, X.; Pan, Q. MiRNA-96-5p impacts the progression of breast cancer through targeting FOXO3. Thorac. Cancer 2020, 11, 956–963. [Google Scholar] [CrossRef] [PubMed]
- Khatri, S.; Yepiskoposyan, H.; Gallo, C.A.; Tandon, P.; Plas, D.R. FOXO3a regulates glycolysis via transcriptional control of tumor suppressor TSC1. J. Biol. Chem. 2010, 285, 15960–15965. [Google Scholar] [CrossRef] [PubMed]
- Justenhoven, C.; Winter, S.; Hamann, U.; Haas, S.; Fischer, H.-P.; Pesch, B.; Brüning, T.; Ko, Y.-D.; Brauch, H.; GENICA Network. The frameshift polymorphism CYP3A43_74_delA is associated with poor differentiation of breast tumors. Cancer 2010, 116, 5358–5364. [Google Scholar] [CrossRef] [PubMed]
- Aziz, M.H.; Chen, X.; Zhang, Q.; DeFrain, C.; Osland, J.; Luo, Y.; Shi, X.; Yuan, R. Suppressing NRIP1 inhibits growth of breast cancer cells in vitro and in vivo. Oncotarget 2015, 6, 39714–39724. [Google Scholar] [CrossRef]
- Binato, R.; Corrêa, S.; Panis, C.; Ferreira, G.; Petrone, I.; da Costa, I.R.; Abdelhay, E. NRIP1 is activated by C-JUN/C-FOS and activates the expression of PGR, ESR1 and CCND1 in luminal A breast cancer. Sci. Rep. 2021, 11, 21159. [Google Scholar] [CrossRef]
- Eiseler, T.; Döppler, H.; Yan, I.K.; Goodison, S.; Storz, P. Protein kinase D1 regulates matrix metalloproteinase expression and inhibits breast cancer cell invasion. Breast Cancer Res. 2009, 11, R13. [Google Scholar] [CrossRef]
- Borges, S.; Döppler, H.; Perez, E.A.; Andorfer, C.A.; Sun, Z.; Anastasiadis, P.Z.; Thompson, E.A.; Geiger, X.J.; Storz, P. Pharmacologic reversion of epigenetic silencing of the PRKD1 promoter blocks breast tumor cell invasion and metastasis. Breast Cancer Res. 2013, 15, R66. [Google Scholar] [CrossRef]
- Li, W.; Ma, L.; Zhao, J.; Liu, X.; Li, Z.; Zhang, Y. Expression profile of MTA1 in adult mouse tissues. Tissue Cell 2009, 41, 390–399. [Google Scholar] [CrossRef]
- Xian, L.; Dong, D.; Luo, J.; Zhuo, L.; Li, K.; Zhang, P.; Wang, W.; Xu, Y.; Xu, G.; Wang, L.; et al. Expression of THSD7A in neoplasm tissues and its relationship with proteinuria. BMC Nephrol. 2019, 20, 332. [Google Scholar] [CrossRef] [PubMed]
- Yao, B.; Xing, M.; Zeng, X.; Zhang, M.; Zheng, Q.; Wang, Z.; Peng, B.; Qu, S.; Li, L.; Jin, Y.; et al. KMT2D-mediated H3K4me1 recruits YBX1 to facilitate triple-negative breast cancer progression through epigenetic activation of c-Myc. Clin. Transl. Med. 2024, 14, e1753. [Google Scholar] [CrossRef] [PubMed]
- Shen, Z.; Wen, X.F.; Lan, F.; Shen, Z.Z.; Shao, Z.M. The tumor suppressor gene LKB1 is associated with prognosis in human breast carcinoma. Clin. Cancer Res. 2002, 8, 2085–2090. [Google Scholar] [PubMed]
- Zhuang, Z.G.; Di, G.H.; Shen, Z.Z.; Ding, J.; Shao, Z.M. Enhanced expression of LKB1 in breast cancer cells attenuates angiogenesis, invasion, and metastatic potential. Mol. Cancer Res. 2006, 4, 843–849. [Google Scholar] [CrossRef]
- Li, J.; Liu, J.; Li, P.; Mao, X.; Li, W.; Yang, J.; Liu, P. Loss of LKB1 disrupts breast epithelial cell polarity and promotes breast cancer metastasis and invasion. J. Exp. Clin. Cancer Res. 2014, 33, 70. [Google Scholar] [CrossRef]
- Nath-Sain, S.; Marignani, P.A. LKB1 catalytic activity contributes to estrogen receptor alpha signaling. Mol. Biol. Cell 2009, 20, 2785–2795. [Google Scholar] [CrossRef]
- Furuyama, T.; Nakazawa, T.; Nakano, I.; Mori, N. Identification of the differential distribution patterns of mRNAs and consensus binding sequences for mouse DAF-16 homologues. Biochem. J. 2000, 349, 629–634. [Google Scholar] [CrossRef]








| Sample Name | Group | Status |
|---|---|---|
| Geriatric_Alive | Geriatric | Alive |
| Geriatric_Died of Disease | Geriatric | Died of Disease |
| PreMenopausal_Alive | PreMenopausal | Alive |
| PreMenopausal_Died of Disease | PreMenopausal | Died of Disease |
| PostMenopausalNonGeriatric_Alive | PostMenopausalNonGeriatric | Alive |
| PostMenopausalNonGeriatric_Died of Disease | PostMenopausalNonGeriatric | Died of Disease |
| Gene Name | Log2 Fold Change | BH-Adjusted p-Value (FDR) | Direction |
|---|---|---|---|
| CYP3A43 | −2.9775 | 0.040146 | Down |
| RICTOR | −2.7493 | 0.000148 | Down |
| AGMO | −2.2243 | 2.55 × 10⁻⁵ | Down |
| FOXO3 | −2.2235 | 0.000132 | Down |
| KMT2D | 2.2647 | 0.020281 | Up |
| MYO1A | 2.3325 | 0.000109 | Up |
| IZUMO1R | 2.3771 | 0.005937 | Up |
| MAPK7 | 2.4586 | 0.008039 | Up |
| MMP16 | 2.5389 | 0.013266 | Up |
| PRKD1 | 2.5745 | 0.010218 | Up |
| STK11 | 2.6159 | 0.043386 | Up |
| RFNG | 2.6941 | 0.012076 | Up |
| BBC3 | 2.7481 | 0.030447 | Up |
| NRIP1 | 2.8304 | 0.021395 | Up |
| BMP10 | 2.8427 | 0.049004 | Up |
| AKT2 | 2.9156 | 0.016484 | Up |
| ABCC10 | 3.1221 | 0.046946 | Up |
| THSD7A | 3.166 | 0.046532 | Up |
| ATM | 3.3504 | 0.002421 | Up |
| HERC2 | 4.38 | 0.005774 | Up |
| Group | FDR | Fold Enriched | Pathway |
|---|---|---|---|
| Upregulated | 8.66 × 10⁻³ | 8.032407407 | MAPK signaling pathway |
| Upregulated | 8.66 × 10⁻³ | 12.51803752 | Cushing syndrome |
| Upregulated | 8.66 × 10⁻³ | 10.80592925 | Human T-cell leukemia virus 1 infection |
| Upregulated | 8.66 × 10⁻³ | 5.466288595 | Pathways in cancer |
| Upregulated | 8.66 × 10⁻³ | 18.77705628 | Pancreatic cancer |
| Upregulated | 8.66 × 10⁻³ | 19.80593607 | Melanoma |
| Upregulated | 8.66 × 10⁻³ | 19.02412281 | Chronic myeloid leukemia |
| Upregulated | 8.66 × 10⁻³ | 12.93810589 | Gastric cancer |
| Upregulated | 1.47 × 10⁻² | 15.06076389 | Endocrine resistance |
| Upregulated | 1.64 × 10⁻² | 13.51246106 | TGF-beta signaling pathway |
| Downregulated | 3.83 × 10⁻⁵ | 20.26869159 | TGF-beta signaling pathway |
| Downregulated | 3.83 × 10⁻⁵ | 6.832860744 | Pathways in cancer |
| Downregulated | 1.01 × 10⁻⁴ | 14.65371622 | Breast cancer |
| Downregulated | 1.75 × 10⁻⁴ | 18.82595486 | Endocrine resistance |
| Downregulated | 1.25 × 10⁻³ | 19.27777778 | Platinum drug resistance |
| Downregulated | 4.88 × 10⁻³ | 6.532379518 | Human papillomavirus infection |
| Downregulated | 5.55 × 10⁻³ | 12.04861111 | Thyroid hormone signaling pathway |
| Downregulated | 6.11 × 10⁻³ | 20.85336538 | Fanconi anemia pathway |
| Downregulated | 8.63 × 10⁻³ | 10.04050926 | Signaling pathways regulating pluripotency of stem cells |
| Downregulated | 1.39 × 10⁻² | 13.72626582 | EGFR tyrosine kinase inhibitor resistance |
| ± Sd) | Alive (n = 313) | Died of Disease (n= 145) | Test Statistics | p-Value * |
|---|---|---|---|---|
| Age | 56.6 ± 11.0 | 64.2 ± 12.7 | −6.57 | <0.001 |
| Tumor Size | 21.9 ± 9.9 | 28.2 ± 16.0 | −5.17 | <0.001 |
| Months | 160.1 ± 72.6 | 103.0 ± 66.0 | 8.06 | <0.001 |
| Nottingham Prognostic Index | 3.46 ± 0.9 | 4.01 ± 1.2 | −5.19 | <0.001 |
| Group (Count (%)) | Alive (n = 313) | Died of Disease (145) | Test Statistics | p-Value * |
|---|---|---|---|---|
| Type of breast surgery | ||||
| Mastectomy | 143 (45.7) | 106 (73.1) | 30.0 | <0.001 |
| Breast conserving surgery | 170 (54.3) | 39 (26.9) | ||
| Chemotherapy | ||||
| Yes | 34 (10.9) | 17 (11.7) | 0.0743 | 0.785 |
| No | 279 (89.1) | 128 (88.3) | ||
| Radiotherapy | ||||
| Yes | 204 (65.2) | 64 (44.1) | 18.1 | <0.001 |
| No | 109 (34.8) | 81 (55.9) | ||
| Histological grade high | ||||
| Yes | 104 (33.2) | 34 (23.4) | 0.89 | 0.123 |
| No | 209 (66.8) | 111 (76.6) | ||
| Histological grade low | ||||
| Yes | 53 (16.9) | 37 (25.5) | 0.95 | 0.095 |
| No | 260 (83.1) | 108 (74.5) | ||
| Histopathological type | ||||
| Invasive Ductal Carcinoma (IDC) | 237 (75.7) | 112 (77.2) | ||
| Invasive Lobular Carcinoma (ILC) | 22 (7.0) | 13(9.0) | 0.025 | 0.542 |
| Mixed tumor IDC + ILC | 54 (17.3) | 20 (13.8) | ||
| Primary tumor laterality | ||||
| Right | 157 (53.2) | 59 (41.3) | 5.51 | 0.019 |
| Left | 138 (46.8) | 84 (58.7) | ||
| Cellularity | ||||
| High | 283 (90.4) | 134 (93.7) | 0.012 | 0.689 |
| Low | 30 (9.6) | 9 (6.3) | ||
| HER2 loss | ||||
| Yes | 14 (4.5) | 7 (4.8) | ||
| No | 299 (95.5) | 138 (95.2) | 0.081 | 0.793 |
| Tumor stage | ||||
| Stage I | 150 (47.9) | 54 (37.2) | ||
| Stage II | 151 (48.2) | 78 (53.8) | 1.12 | 0.056 |
| Stage III | 9 (2.9) | 11 (7.6) | ||
| Stage IV | 3 (1.0) | 2 (1.4) |
| Model | Parameter | Values |
|---|---|---|
| RF | Bootstrap | (True, False) |
| oob_score | (True, False) | |
| max_depth | 3, 4, 5, 6, 7 | |
| n_estimators | 50, 100, 150, 200, 250 | |
| min_samples_split | 2, 3, 4, 5 | |
| max_leaf_nodes | None, 2, 3, 4 | |
| MLP | hidden_layer_sizes | (10, 10), (15, 15), (20, 10), (20, 15) |
| Activation | tanh, relu | |
| learning_rate | 0.01, 0.001 | |
| LR | C | 0.5, 1, 2, 3, 4, 5, 6 |
| Penalty | L1, L2, elasticnet | |
| Solver | Newton-cg, lbfgs, saga | |
| max_iter | 50, 100, 200 | |
| class_weight | Balanced, None | |
| XGB | min_child_weight | 1, 3, 5, 7, 10 |
| Gamma | 1, 3, 5, 7, 10 | |
| colsample_bytree | 0.4, 0.5, 0.6 | |
| reg_alpha | 0, 0.2, 0.3 | |
| max_depth | 4, 5, 6 | |
| Subsample | 0.6, 0.7, 0.8 | |
| n_estimators | 100, 200, 300, 400, 500 | |
| learning_rate | 0.1, 0.01 |
| Model | Performance Metrics | ||||
|---|---|---|---|---|---|
| Random Forest | Accuracy | Sensitivity | Specificity | F1 Score | AUC |
| Before SMOTE (Training) | 0.83 | 0.91 | 0.65 | 0.88 | 0.61 |
| After SMOTE (Training) | 0.80 | 0.81 | 0.77 | 0.84 | 0.60 |
| Last Model (Testing) | 0.80 | 0.82 | 0.77 | 0.85 | 0.61 |
| Logistic Regression | |||||
| Before SMOTE (Training) | 0.68 | 1.00 | - | 0.81 | 0.52 |
| After SMOTE (Training) | 0.68 | 1.00 | - | 0.81 | 0.52 |
| Last Model (Testing) | 0.69 | 0.64 | 0.78 | 0.74 | 0.55 |
| Multilayer Perceptron | |||||
| Before SMOTE (Training) | 0.75 | 0.95 | 0.33 | 0.84 | 0.58 |
| After SMOTE (Training) | 0.77 | 0.90 | 0.48 | 0.84 | 0.59 |
| Last Model (Testing) | 0.87 | 0.96 | 0.68 | 0.91 | 0.61 |
| XGBoost | |||||
| Before SMOTE (Training) | 0.87 | 0.96 | 0.66 | 0.91 | 0.63 |
| After SMOTE (Training) | 0.94 | 0.92 | 1.00 | 0.96 | 0.80 |
| Last Model (Testing) | 0.98 | 0.98 | 0.97 | 0.99 | 0.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kivrak, M.; Sevim Nalkiran, H.; Kesen, O.; Nalkiran, I. Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data. Biology 2025, 14, 1539. https://doi.org/10.3390/biology14111539
Kivrak M, Sevim Nalkiran H, Kesen O, Nalkiran I. Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data. Biology. 2025; 14(11):1539. https://doi.org/10.3390/biology14111539
Chicago/Turabian StyleKivrak, Mehmet, Hatice Sevim Nalkiran, Oguzhan Kesen, and Ihsan Nalkiran. 2025. "Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data" Biology 14, no. 11: 1539. https://doi.org/10.3390/biology14111539
APA StyleKivrak, M., Sevim Nalkiran, H., Kesen, O., & Nalkiran, I. (2025). Integrative Machine Learning Model for Overall Survival Prediction in Breast Cancer Using Clinical and Transcriptomic Data. Biology, 14(11), 1539. https://doi.org/10.3390/biology14111539

