Next Article in Journal / Special Issue
A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization
Previous Article in Journal
Analysis of SAP Log Data Based on Network Community Decomposition
Previous Article in Special Issue
Machine Reading Comprehension for Answer Re-Ranking in Customer Support Chatbots
Article Menu

Export Article

Open AccessArticle
Information 2019, 10(3), 93;

Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies

Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., Sofia 1164, Bulgaria
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. G.Bonchev Str., Block 8, Sofia 1113, Bulgaria
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in The 18th International Conference on Artificial Intelligence: Methodology, Systems, Applications, AIMSA 2018, Varna, Bulgaria, 12–14 September 2018.
Received: 21 January 2019 / Revised: 23 February 2019 / Accepted: 27 February 2019 / Published: 3 March 2019
(This article belongs to the Special Issue Artificial Intelligence—Methodology, Systems, and Applications)
Full-Text   |   PDF [409 KB, uploaded 3 March 2019]   |  


The application of machine learning models for prediction and prognosis of disease development has become an irrevocable part of cancer studies aimed at improving the subsequent therapy and management of patients. The application of machine learning models for accurate prediction of survival time in breast cancer on the basis of clinical data is the main objective of the presented study. The paper discusses an approach to the problem in which the main factor used to predict survival time is the originally developed tumor-integrated clinical feature, which combines tumor stage, tumor size, and age at diagnosis. Two datasets from corresponding breast cancer studies are united by applying a data integration approach based on horizontal and vertical integration by using proper document-oriented and graph databases which show good performance and no data losses. Aside from data normalization and classification, the applied machine learning methods provide promising results in terms of accuracy of survival time prediction. The analysis of our experiments shows an advantage of the linear Support Vector Regression, Lasso regression, Kernel Ridge regression, K-neighborhood regression, and Decision Tree regression—these models achieve most accurate survival prognosis results. The cross-validation for accuracy demonstrates best performance of the same models on the studied breast cancer data. As a support for the proposed approach, a Python-based workflow has been developed and the plans for its further improvement are finally discussed in the paper. View Full-Text
Keywords: bioinformatics; machine learning; breast cancer; survival time prognosis; cross-validation bioinformatics; machine learning; breast cancer; survival time prognosis; cross-validation

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Mihaylov, I.; Nisheva, M.; Vassilev, D. Application of Machine Learning Models for Survival Prognosis in Breast Cancer Studies. Information 2019, 10, 93.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top