Next Article in Journal
Emulation of a Target Trial to Evaluate the Causal Effect of Palliative Care Consultation on the Survival Time of Patients with Hepatocellular Carcinoma
Next Article in Special Issue
Comparison of Structural and Short Variants Detected by Linked-Read and Whole-Exome Sequencing in Multiple Myeloma
Previous Article in Journal
Renal Cell Carcinoma Health Disparities in Stage and Mortality among American Indians/Alaska Natives and Hispanic Americans: Comparison of National Cancer Database and Arizona Cancer Registry Data
Previous Article in Special Issue
Lights and Shadows in Immuno-Oncology Drug Development
Article

Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification

by 1,2,3,†, 1,2,†, 4 and 1,2,3,*
1
Cancer Bioinformatics, Cancer Centre at Guy’s Hospital, King’s College London, London SE1 9RT, UK
2
School of Cancer and Pharmaceutical Sciences, King’s College London, London SE1 1UL, UK
3
Breast Cancer Now Research Unit, Cancer Centre at Guy’s Hospital, King’s College London, London SE1 9RT, UK
4
Division of Oncology, Department of Clinical Sciences Lund, Lund University, Medicon Village, SE-223 81 Lund, Sweden
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editor: Maggie Chon U Cheang
Cancers 2021, 13(5), 991; https://doi.org/10.3390/cancers13050991
Received: 28 December 2020 / Revised: 16 February 2021 / Accepted: 20 February 2021 / Published: 27 February 2021
Breast cancer is a complex disease, and the identification of its underlying molecular mechanisms is critical for the development of treatment strategies. The purpose of this study was to implement a computational framework that is capable of combining many types of data into a meaningful classification. While our approach can be used on many types of data and in many diseases, we applied this framework to breast cancer data and identified six triple-negative breast cancer subtypes with distinct underlying molecular mechanisms. The relevance of our approach is highlighted by the clinical outcome analysis in which a group of patients responding poorly to standard-of-care adjuvant chemotherapy was identified. This study serves as a starting point for our computational framework, which can be extended to different types of data from different diseases.
Advances in high-throughput technologies encourage the generation of large amounts of multiomics data to investigate complex diseases, including breast cancer. Given that the aetiologies of such diseases extend beyond a single biological entity, and that essential biological information can be carried by all data regardless of data type, integrative analyses are needed to identify clinically relevant patterns. To facilitate such analyses, we present a permutation-based framework for random forest methods which simultaneously allows the unbiased integration of mixed-type data and assessment of relative feature importance. Through simulation studies and machine learning datasets, the performance of the approach was evaluated. The results showed minimal multicollinearity and limited overfitting. To further assess the performance, the permutation-based framework was applied to high-dimensional mixed-type data from two independent breast cancer cohorts. Reproducibility and robustness of our approach was demonstrated by the concordance in relative feature importance between the cohorts, along with consistencies in clustering profiles. One of the identified clusters was shown to be prognostic for clinical outcome after standard-of-care adjuvant chemotherapy and outperformed current intrinsic molecular breast cancer classifications. View Full-Text
Keywords: breast cancer; random forest; machine learning; integrative analysis; DNA damage repair breast cancer; random forest; machine learning; integrative analysis; DNA damage repair
Show Figures

Figure 1

MDPI and ACS Style

Quist, J.; Taylor, L.; Staaf, J.; Grigoriadis, A. Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification. Cancers 2021, 13, 991. https://doi.org/10.3390/cancers13050991

AMA Style

Quist J, Taylor L, Staaf J, Grigoriadis A. Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification. Cancers. 2021; 13(5):991. https://doi.org/10.3390/cancers13050991

Chicago/Turabian Style

Quist, Jelmar, Lawson Taylor, Johan Staaf, and Anita Grigoriadis. 2021. "Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification" Cancers 13, no. 5: 991. https://doi.org/10.3390/cancers13050991

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop