One of the major challenges in defining clinically-relevant and less heterogeneous tumor subtypes is assigning biological and/or clinical interpretations to etiological (intrinsic) subtypes. Conventional clustering/subtyping approaches often fail to define such subtypes, as they involve several discrete steps. Here we demonstrate a unique machine-learning method, phenotype mapping (PhenMap
), which jointly integrates single omics data with phenotypic information using three published breast cancer datasets (n = 2045). The PhenMap
framework uses a modified factor analysis method that is governed by a key assumption that, features from different omics data types are correlated due to specific “hidden/mapping” variables (context-specific mapping variables (CMV)). These variables can be simultaneously modeled with phenotypic data as covariates to yield functional subtypes and their associated features (e.g., genes) and phenotypes. In one example, we demonstrate the identification and validation of six novel “functional” (discrete) subtypes with differential responses to a cyclin-dependent kinase (CDK)4/6 inhibitor and etoposide by jointly integrating transcriptome profiles with four different drug response data from 37 breast cancer cell lines. These robust subtypes are also present in patient breast tumors with different prognosis. In another example, we modeled patient gene expression profiles and clinical covariates together to identify continuous subtypes with clinical/biological implications. Overall, this genome-phenome machine-learning integration tool, PhenMap
identifies functional and phenotype-integrated discrete or continuous subtypes with clinical translational potential.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited