Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
Cancer Bioinformatics, Cancer Centre at Guy’s Hospital, King’s College London, London SE1 9RT, UK
School of Cancer and Pharmaceutical Sciences, King’s College London, London SE1 1UL, UK
Breast Cancer Now Research Unit, Cancer Centre at Guy’s Hospital, King’s College London, London SE1 9RT, UK
Division of Oncology, Department of Clinical Sciences Lund, Lund University, Medicon Village, SE-223 81 Lund, Sweden
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editor: Maggie Chon U Cheang
Received: 28 December 2020
Revised: 16 February 2021
Accepted: 20 February 2021
Published: 27 February 2021
Breast cancer is a complex disease, and the identification of its underlying molecular mechanisms is critical for the development of treatment strategies. The purpose of this study was to implement a computational framework that is capable of combining many types of data into a meaningful classification. While our approach can be used on many types of data and in many diseases, we applied this framework to breast cancer data and identified six triple-negative breast cancer subtypes with distinct underlying molecular mechanisms. The relevance of our approach is highlighted by the clinical outcome analysis in which a group of patients responding poorly to standard-of-care adjuvant chemotherapy was identified. This study serves as a starting point for our computational framework, which can be extended to different types of data from different diseases.