Next Article in Journal
The Position of Inhaled Chemotherapy in the Care of Patients with Lung Tumors: Clinical Feasibility and Indications According to Recent Pharmaceutical Progresses
Next Article in Special Issue
Examination of Independent Prognostic Power of Gene Expressions and Histopathological Imaging Features in Cancer
Previous Article in Journal
Gene Regulation by Antitumor miR-204-5p in Pancreatic Ductal Adenocarcinoma: The Clinical Significance of Direct RACGAP1 Regulation
Previous Article in Special Issue
Observed Survival Interval: A Supplement to TCGA Pan-Cancer Clinical Data Resource
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Brief Report

Breast Cancer Prognosis Using a Machine Learning Approach

1
BioBIM (InterInstitutional Multidisciplinary Biobank), IRCCS San Raffaele Pisana, Via di Val Cannuta 247, 00166 Rome, Italy
2
Department of Human Sciences & Quality of Life Promotion, San Raffaele Roma Open University, Via di Val Cannuta 247, 00166 Rome, Italy
3
Department of Enterprise Engineering, University of Rome “Tor Vergata”, Viale Oxford 81, 00133 Rome, Italy
4
Department of Systems Medicine, Medical Oncology, Tor Vergata Clinical Center, University of Rome “Tor Vergata”, Viale Oxford 81, 00133 Rome, Italy
*
Author to whom correspondence should be addressed.
Cancers 2019, 11(3), 328; https://doi.org/10.3390/cancers11030328
Submission received: 18 December 2018 / Revised: 26 February 2019 / Accepted: 4 March 2019 / Published: 7 March 2019
(This article belongs to the Special Issue Application of Bioinformatics in Cancers)

Abstract

:
Machine learning (ML) has been recently introduced to develop prognostic classification models that can be used to predict outcomes in individual cancer patients. Here, we report the significance of an ML-based decision support system (DSS), combined with random optimization (RO), to extract prognostic information from routinely collected demographic, clinical and biochemical data of breast cancer (BC) patients. A DSS model was developed in a training set (n = 318), whose performance analysis in the testing set (n = 136) resulted in a C-index for progression-free survival of 0.84, with an accuracy of 86%. Furthermore, the model was capable of stratifying the testing set into two groups of patients with low- or high-risk of progression with a hazard ratio (HR) of 10.9 (p < 0.0001). Validation in multicenter prospective studies and appropriate management of privacy issues in relation to digital electronic health records (EHR) data are presently needed. Nonetheless, we may conclude that the implementation of ML algorithms and RO models into EHR data might help to achieve prognostic information, and has the potential to revolutionize the practice of personalized medicine.

1. Introduction

The breast cancer (BC) death rate has declined steadily over the past two decades, progress that can be attributed to the deployment of innovative management pathways, from early detection to treatment. Nevertheless, BC still represents the leading cause of cancer death among females worldwide [1]. Accordingly, BC survivability prediction represents a challenging task that could strongly benefit from the development of personalized predictive models. In this context, contemporary oncology has witnessed a growing interest in digital technologies, whose integration with big healthcare data has raised new hopes for personalized medicine.
Artificial intelligence (AI) and machine learning (ML) have been used to diagnose and classify cancer for nearly 20 years, but only a few studies have investigated their relevance in cancer prognosis [2]. In particular, ML or semi-supervised learning techniques have been recently applied to develop models for BC progression and survivability. Most of them, however, were built on datasets from the SEER (Surveillance Program, Epidemiology, and End Results), not including important prognostic parameters such as the St. Gallen criteria (hormones receptor status, HER2/Neu expression or Ki67 proliferation index) [3,4,5], while other studies were performed on hybrid models containing microarray data [6] or on mammographic images [7,8]. Lately, an unsupervised ML approach that can admit any number of prognostic factors, was used to build prognostic systems for cancer patients [9]. Also, in this case, the SEER dataset used did not include information on HER2/Neu expression, whose prognostic significance has been emphasized in the 8th edition TNM staging system for BC [10]. Thus, the unmet need to develop prognostic classification models that embody the newest AI technologies and can be used to predict outcomes in individual cancer patients for personalized patient care has been highlighted [11].
In this context, we have recently demonstrated the potential of a semi-explainable decision support system (DSS), based on multiple kernel learning (MKL) [12], that can be adapted to different medical problems [13,14] and gives the possibility to inspect the learned model. The model combines a support vector machine (SVM) [15] algorithm and random optimization (RO) [16]. Hence, it can offer an explanation on how routinely collected demographic, clinical and biochemical data are important in predictions. This MKL model, originally developed for cancer-associated thrombosis risk assessment [13], has been here adapted to estimate the risk of disease progression in an oncology setting of BC patients. To achieve this objective, a proof-of-concept study was specifically designed to assess whether a customized MKL-based DSS could be a useful prognostic tool in the clinical management of BC patients.

2. Results

A set of predictors (named ML-RO) was identified using a 3-fold cross-validation technique on a training set (n = 318). A testing set (n = 136) was used to compute the final performance of risk predictors. To devise the DSS, we selected ML-RO-4 as the best performing out of a range of ten runs, in terms of the area under the curve (AUC), on the training set (Table 1).
As shown in Table 1, most predictors had a receiver operating characteristic (ROC) curve with an AUC ≥0.75 (the threshold generally accepted as clinically useful) [17]. Among these, ML-RO-0 was further selected as it provided a major relative importance to the group of features linked to glucose metabolism (Group 5) (Table 2), which is currently considered an important contributor to BC progression [18,19], at the point that metformin—an anti-diabetic drug with insulin-lowering effects—has been proposed in combination with chemotherapy [20,21] and is currently being considered vs. placebo in a phase-III randomized trial in early stage BC (ClinicalTrials.gov Identifier: NCT01101438).
When both predictors were incorporated into a DSS model for BC progression, their combined use (both positive, either positive, both negative) in the testing set translated in a c-statistic = 0.84 (95% CI: 0.76–0.90). The level with the best Youden index at ROC analysis (>1, i.e., risk estimate achieved by both predictors, according to voting on the positive class) was then selected as the cutoff value for further evaluation of the combined DSS. A comparison of the analytical performance of the trained models and derived DSS on the testing set is reported in Table 3.
At a criterion >1, the DSS model was capable of stratifying primary BC patients into two groups with a low- or high-risk of progression, either in the training (n = 279; log-rank = 3.23, p = 0.001) or in the testing set (n = 118; log-rank = 3.42, p < 0.001). Figure 1 reports the Kaplan–Meier curves of progression-free survival (PFS) in the 136 BC women included in the testing set and followed-up for a mean time of 3.5 years (ranging from 0.3–9.7 years). As shown, patients estimated at high risk (>1) of progression by the combined DSS model had a 5-year progression-free survival probability significantly lower than that observed in BC patients estimated at low-risk (≤1) (26% vs. 85%, respectively; log-rank = 6.82, p < 0.0001).

3. Discussion

Treatment decisions are particularly challenging in early-stage BC patients with conflicting prognostic features, especially node-negative ones, in which the question of whether to pursue an adjuvant treatment with chemotherapy or endocrine therapies is still unclear. Putative biomarkers, so far, have not demonstrated sufficient predictive ability to be clinically useful. Ki67 itself lacks reproducibility and its use, if not part of an AI model, has been largely re-dimensioned [22].
Identification of predictive tools of tumor responsiveness, risk of recurrence, and mortality, providing the possibility to avoid unnecessary toxicities are thus very appealing. As reported above, ML has started to take hold across the oncology community to develop prognostic classifications models of BC progression and survivability [9]. In this regard, the possibility to perform an automated survival prediction in metastatic cancer patients using high-dimensional electronic health records (EHR) data has been recently highlighted [23]. By using an ML approach on EHR-derived predictor variables (clustered into categories), Gensheimer et al., in fact, devised an AI system, with a better c-statistic than previously reported prognostic models, which could be deployed in a DSS to help improve quality of care in the metastatic setting [23]. More recently, four major nonlinear ML methods (integrating multiple clinicopathological features and genomic data) were used to compare survival predictions in a large cohort of BC patients [24]. Although no model significantly outperformed others, the Nottingham Prognostic Index, age, tumor stage and size, ER/PR/HER2 and breast surgery status strongly influenced survival across repeated runs and models, while the gene expression cluster was a moderately influential factor [24].
The results here reported confirm and extend the findings by Zhao et al., as the use of an SVM has proven effective in devising an AI-based DSS for the prognostic assessment of non-metastatic BC patients. In particular, the combined use of ML and RO techniques, allowed the construction of a set of prognostic discriminators from routinely collected clinicopathological features and biochemical data of BC patients, which showed a better performance than the predictors developed by Zhao et al. (c-statistic 0.82 vs. 0.66 and an accuracy of 86% vs. 73%, respectively) [24]. In our opinion, this combined approach might hold potential for improving model precision through weighting the relative importance of attributes. Moreover, with respect to models based on neural networks [7], the combination of ML and RO techniques offers a model that can be learned with small datasets and that is more interpretable, as were Bayesian networks applied to BC [25]. Furthermore, the devised DSS included a number of prognostic and metabolic parameters, not previously analyzed, that could be easily extracted by EHR, meaning that ML may add significant and sustained benefits to personalized medicine at no additional cost to the health system.
Of course, there are limitations to acknowledge. First, the study was mono-institutional. Second, the sample size was relatively small, which may have lowered the power of ML. Nonetheless, we believe that implementation of ML algorithms and RO models into high-dimensional EHR data might help to achieve prognostic information, and has the potential to revolutionize the practice of personalized medicine.

4. Patients and Methods

Starting from January 2007, the PTV Bio.Ca.Re. (Policlinico Tor Vergata Biospecimen Cancer Repository) and the SR-BioBIM (Interinstitutional Multidisciplinary Biobank, IRCCS San Raffaele Pisana, Rome, Italy) are actively involved in the recruitment of ambulatory patients with primary or metastatic cancer, who are prospectively followed under the appropriate institutional ethics approval, as part of a Clinical Database and Biobank project. Among these, a cohort of 454 consecutive BC patients in whom prognostic and pre-treatment biochemical factors were available, were selected for the present analysis. The study was performed in accordance with the principles embodied in the Declaration of Helsinki. All patients gave written informed consent, previously approved by our Institutional Ethics Committee (ISR/DMLBA/405, 15 November 2006). BC was pathologically staged according to the latest prognostic TNM staging system [8]. Three hundred and ninety-seven women (87%) had primary BC and underwent radical surgery followed by radiation and/or adjuvant treatment as per current guidelines. The remaining 57 (13%) patients presented with metastatic disease. Prognostic routinely-collected factors such as BC stage, menopausal status, pathological grading as well as the St. Gallen criteria (e.g., estrogen and progesterone receptors, HER2/neu expression and the proliferation index Ki67) were available for each patient. In particular, grading was assessed according to the Nottingham grading system (Elston–Ellis modification of the Scarff–Bloom–Richardson grading system) for BC [8]. The immunohistochemical analyses were performed on formalin-fixed, paraffin-embedded tumor sections for hormone receptor presence [26], HER2/neu expression [27] and proliferation index (Ki67) [28]. HER2/neu positivity was defined according to the American Society of Clinical Oncology-College of American Pathologists (ASCO-CAP) guidelines as an immunohistochemical staining of 3+ or 2+ with evidence of gene amplification by fluorescence in situ hybridization (FISH) [27]. The Ki67 proliferative index in surgical specimens was assigned by the pathologist based on the percentage of positivity on at least 500 neoplastic cells counted in the peripheral area of the nodule. A cut–off value of ≥20% was used in all association analyses, according to the recommendations of the St. Gallen International Expert Consensus on the primary therapy of early BC 2013 [28].
Furthermore, given the increasing awareness that metabolic features might represent an important contributor to BC progression, Type 2 diabetes, glycemic parameters and the body mass index (BMI) were introduced in the model [18,19]. Routine biochemical analyses were performed on fresh blood samples taken in the morning after an overnight fast at the time of enrolment and prior to any treatment (surgery, adjuvant, either chemotherapy or endocrine, or metastatic). The demographic and clinical characteristics of the recruited population are summarized in Table 4.
The machine learning used for the primary analysis was run using the kernel-based learning platform (KeLP) [29], as previously reported [13]. Multiple kernel learning (MKL), based on support vector machines (SVM) and random optimization (RO) models, were used to produce prognostic discriminators (referred as machine-learning (ML)-RO) yielding the best classification performance over a training (3-fold cross-validation) and testing set. The training set consisted of 318 BC patients (70% of the dataset); the remaining 136 patients were allocated to the testing set (30% of the cases). No significant difference was observed for demographic, clinical and biochemical characteristics between the training and testing set (Table 4). The numerical attributes were analyzed as continuous values. Missing clinical attribute values were treated according to the predictive value imputation (PVI) method by replacing missing values with the average of the attribute observed in the training set. The variables were clustered into five groups according to clinical significance. A detailed list of all the features applied to construct the predictor is reported in Table 5. RO was used to devise their relative weights in the final prediction. In RO, relative weights are initialized with a random number and estimated by maximizing performance in the 3-fold cross-validation. These weights can be used to interpret the importance of the groups of features within the model. Thus, the final DSS is interpretable.
Statistical analysis
The receiver operating characteristic (ROC) curve and univariate Cox proportional hazards analyses were performed by MedCalc Statistical Software version 13.1.2 (MedCalc Software bvba, Ostend, Belgium). The area under the curve (AUC) was calculated on a three-level risk: 2 (if both predictors estimated the risk), 1 (if only one predictor estimated the risk) or 0 (if both predictors did not estimate the risk) to investigate whether the combined DSS could distinguish between recurrent and non-recurrent patients. The level with the best Youden index (>1, i.e., risk estimate achieved by both predictors) was selected as the cutoff value for the combined DSS. Bayesian analysis was performed, and positive (+LR) and negative (−LR) likelihood ratios were used to estimate the probability of BC progression. The survival curves were calculated by the Kaplan–Meier and log-rank methods using computer software packages (MedCalc Software bvba, Ostend, Belgium and Statistica 8.0, StatSoft Inc., Tulsa, OK, USA). The PFS represented the study endpoint and was calculated from the date of enrollment until disease progression. The patients who had no disease progression were censored at the time of the last follow-up. For administrative censoring, the follow-up ended on 31 December, 2017. All tests were two-tailed and only p-values lower than 0.05 were regarded as statistically significant.

5. Conclusions

ML has recently started to take hold across the oncology community to develop prognostic classifications models of cancer progression and survivability. In our opinion, a combined approach of ML algorithms and RO models might hold potential for improving model precision through weighting the relative importance of attributes. In line with the actual trend, in fact, the proposed model seeks not only decision, but also interpretability of the model itself, which, together with the use of a real-world BC dataset, represents the novel aspect of our research. Validation in multicenter prospective studies and appropriate management of privacy issues in relation to digital EHR data are required before making any ML approach into the clinical practice available.

Author Contributions

P.F. and M.R. designed the study, analyzed and interpreted the clinical data, and wrote the manuscript; F.M.Z. and N.S. designed the algorithm, performed the machine learning experiments, and wrote the manuscript; S.R. collected clinical and laboratory data, interpreted the data and wrote the manuscript; F.G. designed the study, analyzed and interpreted the data, and critically revised the manuscript. All authors revised and approved the final version of the manuscript.

Funding

This work was partially supported by the European Social Fund, under the Italian Ministries of Education, University and Research (PNR 2015-2020 ARS01_01163 PerMedNet—CUP B66G18000220005) and Economic Development (“HORIZON 2020” PON I&C 2014-2020—F/050383/01-03/X32).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Torre, L.A.; Bray, F.; Siegel, R.L.; Ferlay, J.; Lortet-Tieulent, J.; Jemal, A. Global cancer statistics, 2012. CA Cancer J. Clin. 2015, 65, 87–108. [Google Scholar] [CrossRef] [PubMed]
  2. Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2014, 13, 8–17. [Google Scholar] [CrossRef] [PubMed]
  3. Delen, D.; Walker, G.; Kadam, A. Predicting breast cancer survivability: A comparison of three data mining methods. Artif. Intell. Med. 2005, 34, 113–127. [Google Scholar] [CrossRef] [PubMed]
  4. Kim, J.; Shin, H. Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data. J. Am. Med. Inform. Assoc. 2013, 20, 613–618. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Park, K.; Ali, A.; Kim, D.; An, Y.; Kim, M.; Shin, H. Robust predictive model for evaluating breast cancer survivability. Eng. Appl. Artif. Intell. 2013, 26, 2194–2205. [Google Scholar] [CrossRef]
  6. Sun, Y.; Goodison, S.; Li, J.; Liu, L.; Farmerie, W. Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics 2007, 23, 30–37. [Google Scholar] [CrossRef] [PubMed]
  7. Burt, J.R.; Torosdagli, N.; Khosravan, N.; RaviPrakash, H.; Mortazi, A.; Tissavirasingham, F.; Hussein, S.; Bagci, U. Deep learning beyond cats and dogs: Recent advances in diagnosing breast cancer with deep neural networks. Br. J. Radiol. 2018, 91, 20170545. [Google Scholar] [CrossRef] [PubMed]
  8. Yousefi, B.; Ting, H.N.; Mirhassani, S.M.; Hosseini, M. Development of computer-aided detection of breast lesion using gabor-wavelet BASED features in mammographic images. In Proceedings of the 2013 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), Penang, Malaysia, 29 November–1 December 2013; pp. 127–131. [Google Scholar]
  9. Hueman, M.T.; Wang, H.; Yang, C.Q.; Sheng, L.; Henson, D.E.; Schwartz, A.M.; Chen, D. Creating prognostic systems for cancer patients: A demonstration using breast cancer. Cancer Med. 2018, 7, 3611–3621. [Google Scholar] [CrossRef] [PubMed]
  10. Amin, M.B.; Edge, S.; Greene, F.; Byrd, D.R.; Brookland, R.K.; Washington, M.K.; Gershenwald, J.E.; Compton, C.C.; Hess, K.R.; Sullivan, D.C.; et al. (Eds.) AJCC Cancer Staging Manual, 8th ed.; Springer: New York, NY, USA, 2017. [Google Scholar]
  11. O’Sullivan, B.; Brierley, J.; Byrd, D.; Bosman, F.; Kehoe, S.; Kossary, C.; Piñeros, M.; Van Eycken, E.; Weir, H.K.; Gospodarowicz, M. The TNM classification of malignant tumours-towards common understanding and reasonable expectations. Lancet Oncol. 2017, 18, 849–851. [Google Scholar] [CrossRef]
  12. Gönen, M.; Alpaydın, E. Multiple kernel learning algorithms. J. Mach. Learn. Res. 2011, 12, 2211–2268. [Google Scholar]
  13. Ferroni, P.; Zanzotto, F.M.; Scarpato, N.; Riondino, S.; Nanni, U.; Roselli, M.; Guadagni, F. Risk assessment for venous thromboembolism in chemotherapy treated ambulatory cancer patients: A precision medicine approach. Med. Dec. Mak. 2017, 37, 234–242. [Google Scholar] [CrossRef] [PubMed]
  14. Ferroni, P.; Roselli, M.; Zanzotto, F.M.; Guadagni, F. Artificial Intelligence for cancer-associated thrombosis risk assessment. Lancet Haematol. 2018, 5, e391. [Google Scholar] [CrossRef]
  15. Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and other kernel based learning methods. Ai Magazine 2000, 22, 190. [Google Scholar]
  16. Matyas, J. Random optimization. Automat. Rem. Control 1965, 26, 246–253. [Google Scholar]
  17. Fan, J.; Upadhye, S.; Worster, A. Understanding receiver operating characteristic (ROC) curves. Can. J. Emerg. Med. 2006, 8, 19–20. [Google Scholar] [CrossRef]
  18. Zhu, Q.L.; Xu, W.H.; Tao, M.H. Biomarkers of the Metabolic Syndrome and Breast Cancer Prognosis. Cancers 2010, 2, 721–739. [Google Scholar] [CrossRef] [PubMed]
  19. Ferroni, P.; Riondino, S.; Laudisi, A.; Portarena, I.; Formica, V.; Alessandroni, J.; D’Alessandro, R.; Orlandi, A.; Costarelli, L.; Cavaliere, F.; et al. Pre-treatment insulin levels as a prognostic factor for breast cancer progression. Oncologist 2016, 21, 1041–1049. [Google Scholar] [CrossRef] [PubMed]
  20. Yam, C.; Esteva, F.J.; Patel, M.M.; Raghavendra, A.S.; Ueno, N.T.; Moulder, S.L.; Hess, K.R.; Shroff, G.S.; Hodge, S.; Koenig, K.H.; et al. Efficacy and safety of the combination of metformin, everolimus and exemestane in overweight and obese postmenopausal patients with metastatic, hormone receptor-positive, HER2-negative breast cancer: A phase II study. Investig. New Drugs 2019. [Google Scholar] [CrossRef] [PubMed]
  21. Martin-Castillo, B.; Pernas, S.; Dorca, J.; Álvarez, I.; Martínez, S.; Pérez-Garcia, J.M.; Batista-López, N.; Rodríguez-Sánchez, C.A.; Amillano, K.; Domínguez, S.; et al. A phase 2 trial of neoadjuvant metformin in combination with trastuzumab and chemotherapy in women with early HER2-positive breast cancer: The METTEN study. Oncotarget 2018, 9, 35687–35704. [Google Scholar] [CrossRef] [PubMed]
  22. Thakur, S.S.; Li, H.; Chan, A.M.Y.; Tudor, R.; Bigras, G.; Morris, D.; Enwere, E.K.; Yang, H. The use of automated Ki67 analysis to predict Oncotype DX risk-of-recurrence categories in early-stage breast cancer. PLoS ONE 2018, 13, e0188983. [Google Scholar] [CrossRef] [PubMed]
  23. Gensheimer, M.F.; Henry, A.S.; Wood, D.J.; Hastie, T.J.; Aggarwal, S.; Dudley, S.A.; Pradhan, P.; Banerjee, I.; Cho, E.; Ramchandran, K.; et al. Automated survival prediction in metastatic cancer patients using high-dimensional electronic medical record data. J. Natl. Cancer Inst. 2019, 111, djy178. [Google Scholar] [CrossRef] [PubMed]
  24. Zhao, M.; Tang, Y.; Kim, H.; Hasegawa, K. Machine learning with k-means dimensional reduction for predicting survival outcomes in patients with breast cancer. Cancer Inf. 2018, 17, 1–7. [Google Scholar] [CrossRef] [PubMed]
  25. Cruz-Ramírez, N.; Acosta-Mesa, H.G.; Carrillo-Calvet, H.; Nava-Fernández, L.A.; Barrientos-Martínez, R.E. Diagnosis of breast cancer using Bayesian networks: A case study. Comput. Biol. Med. 2007, 37, 1553–1564. [Google Scholar] [CrossRef] [PubMed]
  26. Hammond, M.E.; Hayes, D.F.; Dowsett, M.; Allred, D.C.; Hagerty, K.L.; Badve, S.; Fitzgibbons, P.L.; Francis, G.; Goldstein, N.S.; Hayes, M.; et al. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J. Clin. Oncol. 2010, 28, 2784–2795. [Google Scholar] [CrossRef] [PubMed]
  27. Wolff, A.C.; Hammond, M.E.; Schwartz, J.N.; Hagerty, K.L.; Allred, D.C.; Cote, R.J.; Dowsett, M.; Fitzgibbons, P.L.; Hanna, W.M.; Langer, A.; et al. American Society of Clinical Oncology/College of American Pathologists. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for human epidermal growth factor receptor 2 testing in breast cancer. Arch. Pathol. Lab. Med. 2007, 131, 18–43. [Google Scholar] [PubMed]
  28. Goldhirsch, A.; Winer, E.P.; Coates, A.S.; Gelber, R.D.; Piccart-Gebhart, M.; Thürlimann, B.; Senn, H.J.; Panel Members. Personalizing the treatment of women with early breast cancer: Highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann. Oncol. 2013, 24, 2206–2223. [Google Scholar] [CrossRef] [PubMed]
  29. Filice, S.; Castellucci, G.; Croce, D.; Basili, R. KeLP: A Kernel-based Learning Platform for Natural Language Processing. In Proceedings of the ACL-IJCNLP 2015 System Demonstrations, Beijing, China, 26–31 July 2015; pp. 19–24. [Google Scholar]
Figure 1. Kaplan–Meier curves of progression-free survival (PFS) of the 136 BC women included in the testing set. Comparison between patients at high (>1) or low-risk (≤1) of progression by the combined decision support system (DSS) model.
Figure 1. Kaplan–Meier curves of progression-free survival (PFS) of the 136 BC women included in the testing set. Comparison between patients at high (>1) or low-risk (≤1) of progression by the combined decision support system (DSS) model.
Cancers 11 00328 g001
Table 1. Analytical performance of machine learning with random optimization in the training set.
Table 1. Analytical performance of machine learning with random optimization in the training set.
ML Predictor AUC (SE)95% CISensitivity (95% CI)Specificity (95% CI)+LR−LR
ML-RO-40.778 (0.0290)0.728–0.82267.1 (55.4–77.5)88.4 (83.7–92.2)5.800.37
ML-RO-10.769 (0.0293)0.719–0.81465.8 (54.0–76.3)88.0 (83.2–91.8)5.490.39
ML-RO-70.767 (0.0293)0.717–0.81367.1 (55.4–77.5)86.4 (81.4–90.4)4.920.38
ML-RO-30.759 (0.0296)0.708–0.80565.8 (54.0–76.3)86.0 (80.9–90.1)4.680.40
ML-RO-60.759 (0.0296)0.708–0.80565.8 (54.0–76.3)86.0 (80.9–90.1)4.680.40
ML-RO-80.755 (0.0297)0.703–0.80165.8 (54.0–76.3)85.1 (80.0–89.4)4.420.40
ML-RO-00.753 (0.0297)0.701–0.79965.8 (54.0–76.3)84.7 (79.5–89.0)4.300.40
ML-RO-20.748 (0.0299)0.697–0.79564.5 (52.7–75.1)85.1 (80.0–89.4)4.330.42
ML-RO-90.739 (0.0302)0.687–0.78661.8 (50.0–72.8)86.0 (80.9–90.1)4.400.44
ML-RO-50.722 (0.0306)0.669–0.77059.2 (47.3–70.4)85.1 (80.0–89.4)3.980.48
AUC: Area under the curve; CI: Confidence interval; LR: Likelihood ratio; ML: Machine learning; RO: Random optimization.
Table 2. Weights of attribute groups in the training set.
Table 2. Weights of attribute groups in the training set.
MethodGroupSum of the WeightsNormalized Group Weights
1234512345
ML+RO-40.418901.045510.603110.339090.589692.9963210.139800.348930.201280.113160.19680
ML+RO-00.772991.860621.394450.904561.007405.9400530.130130.313230.234750.152280.16959
ML+RO-60.427560.913731.165140.392970.587553.4869680.122610.262040.334140.112690.16849
ML+RO-80.448781.282240.630750.443500.533983.3392670.134390.383990.188880.132810.15991
ML+RO-10.461491.177420.557820.341410.476603.0147700.153070.390550.185030.113240.15809
ML+RO-70.546821.400250.792640.591190.610233.9411540.138740.355290.201120.150000.15483
ML+RO-30.642741.132490.360780.394820.452412.9832550.215450.379610.120930.132340.15165
Data are absolute numbers for group weights. ML: Machine Learning; RO: Random Optimization.
Table 3. Analytical performance of machine learning with random optimization in the testing set.
Table 3. Analytical performance of machine learning with random optimization in the testing set.
Performance ParameterML-RO-0ML-RO-4DSS Model a
F-measureb0.6960.6770.698
Accuracy0.8530.8380.860
Area under the curve (AUC)0.8220.8130.815
(+)LR (95% CI)9.1 (4.3–20.8)8.5 (3.9–19.6)8.6 (4.2–18.0)
(−)LR (95% CI)0.4 (0.3–0.6)0.4 (0.3–0.6)0.4 (0.2–0.5)
HR (95% CI)10.7 (4.6–24.8)10.3 (4.5–23.7)10.9 (4.5–26.6)
LR: Likelihood ratio; C.I.: Confidence interval; HR: Hazard ratio; a Analytical performance was evaluated after categorization 0/1 based on risk estimate achieved by both predictors; b F-measure represents a harmonic mean of precision [(P) positive predictive value in machine learning] and recall [(R) sensitivity in machine learning] and is calculated as: 2PR/(P+R).
Table 4. Clinical-pathological characteristics of breast cancer (BC) patients. Comparison between training and testing set.
Table 4. Clinical-pathological characteristics of breast cancer (BC) patients. Comparison between training and testing set.
Clinical-Pathological CharacteristicsTraining Set (n = 318)Testing Set (n = 136)
Age (years), Mean ± SD56 ± 1357 ± 12
Menopausal status, N (%)
Pre141 (44)51 (38)
Post177 (56)85 (63)
Body Mass Index, Mean ± SD25.2 ± 4.525.7 ± 5.2
Histological diagnosis, N (%)
Ductal263 (83)121 (89)
Lobular37 (12)9 (7)
Others18 (5)6 (4)
Molecular Type a, N (%)
Triple-negative39 (12)17 (12)
Luminal-like A97 (31)37 (27)
Luminal-like B172 (54)77 (57)
HER2 pos10 (3)5 (4)
Grading, N (%) b
120 (7)15 (13)
2108 (39)45 (38)
3151 (54)58 (49)
Tumor, N (%) b
T1141 (50)59 (50)
T291 (33)42 (36)
T328 (10)5 (4)
T419 (7)12 (10)
Node, N (%) b
N0134 (48)54 (46)
N+145 (52)64 (54)
Prognostic stage, N (%)
I177 (56)70 (50)
II53 (17)20 (15)
III45 (14)26 (19)
IV4 (1)2 (1)
Metastatic39 (12)18 (13)
Receptor status, N (%) c
ER+/PR+235 (74)94 (69)
ER+/PR−29 (9)19 (14)
ER-/PR+5 (2)1 (1)
ER-/PR−49 (15)22 (16)
HER2/neu+, N (%) c66 (21)34 (25)
Ki67 proliferation index ≥20%, N (%) c204 (67)93 (71)
Type 2 Diabetes, N (%)39 (12)11 (8%)
Glucose metabolic asset d
Fasting blood glucose (mg/dl), Mean ± SD105 ± 31102 ± 32
Fasting insulin (µIU/ml), Median (IQR)11.9 (6.4–27.0)10.6 (5.6–19.6)
HbA1c (%), Mean ± SD5.8 ± 0.85.8 ± 0.7
HOMA Index, Mean ± SD3.0 (1.4–8.3)2.9 (1.2–6.3)
Follow-up (years)
Mean (range)3.4 (0.29–10.5)3.5 (0.26–9.65)
a According to St. Gallen Consensus Conference. b Evaluated at time of diagnosis. c Evaluated in a population of 397 primary breast cancer patients. d Evaluated at time of enrollment and prior to any treatment. ER/PR: estrogen/progesterone receptors; HER2: Human epidermal growth factor receptor 2; IQR: Interquartile range; HbA1c: Glycosylated hemoglobin; HOMA Index: Homeostasis model assessment index.
Table 5. Features included in the model.
Table 5. Features included in the model.
Patient-RelatedTumor-RelatedBiochemical
Group 1:
Age Menopausal status
Body Mass Index
Group 2: Molecular type
Histological diagnosis
Grading
TNM stage
Group 4: Total Bilirubin
Creatinine
Group 5: Fasting glycemia
Fasting insulinemia
Glycosylated hemoglobin
HOMA index (insulin resistance)
Type 2 diabetes
Group 3: Estrogen receptors
Progesterone receptors
HER2/NEU
Ki67 proliferation index

Share and Cite

MDPI and ACS Style

Ferroni, P.; Zanzotto, F.M.; Riondino, S.; Scarpato, N.; Guadagni, F.; Roselli, M. Breast Cancer Prognosis Using a Machine Learning Approach. Cancers 2019, 11, 328. https://doi.org/10.3390/cancers11030328

AMA Style

Ferroni P, Zanzotto FM, Riondino S, Scarpato N, Guadagni F, Roselli M. Breast Cancer Prognosis Using a Machine Learning Approach. Cancers. 2019; 11(3):328. https://doi.org/10.3390/cancers11030328

Chicago/Turabian Style

Ferroni, Patrizia, Fabio M. Zanzotto, Silvia Riondino, Noemi Scarpato, Fiorella Guadagni, and Mario Roselli. 2019. "Breast Cancer Prognosis Using a Machine Learning Approach" Cancers 11, no. 3: 328. https://doi.org/10.3390/cancers11030328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop