Next Article in Journal
Stereoselective Multicomponent Reactions in the Synthesis or Transformations of Epoxides and Aziridines
Previous Article in Journal
Mori Ramulus (Chin.Ph.)—the Dried Twigs of Morus alba L./Part 1: Discovery of Two Novel Coumarin Glycosides from the Anti-Hyperuricemic Ethanol Extract
Article Menu
Issue 3 (February-1) cover image

Export Article

Open AccessArticle
Molecules 2019, 24(3), 631;

Stratification of Breast Cancer by Integrating Gene Expression Data and Clinical Variables

School of Computer Science and Technology, Xidian University, Xi’an 710071, China
Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
School of Computer Engineering, Qingdao University of Technology, Qingdao 266033, China
Author to whom correspondence should be addressed.
Academic Editor: Timothy W. Corson
Received: 31 December 2018 / Revised: 26 January 2019 / Accepted: 3 February 2019 / Published: 11 February 2019
Full-Text   |   PDF [7453 KB, uploaded 11 February 2019]   |  
  |   Review Reports


Breast cancer is a heterogeneous disease. Although gene expression profiling has led to the definition of several subtypes of breast cancer, the precise discovery of the subtypes remains a challenge. Clinical data is another promising source. In this study, clinical variables are utilized and integrated to gene expressions for the stratification of breast cancer. We adopt two phases: gene selection and clustering, where the integration is in the gene selection phase; only genes whose expressions are most relevant to each clinical variable and least redundant among themselves are selected for further clustering. In practice, we simply utilize maximum relevance minimum redundancy (mRMR) for gene selection and k-means for clustering. We compare the results of our method with those of two commonly used only expression-based breast cancer stratification methods: prediction analysis of microarray 50 (PAM50) and highest variability (HV). The result is that our method outperforms them in identifying subtypes significantly associated with five-year survival and recurrence time. Specifically, our method identified recurrence-associated breast cancer subtypes that were not identified by PAM50 and HV. Additionally, our analysis discovered three survival-associated luminal-A subgroups and two survival-associated luminal-B subgroups. The study indicates that screening clinically relevant gene expressions yields improved breast cancer stratification. View Full-Text
Keywords: gene expression; clinical variables; stratification; mRMR; clustering; luminal-A; luminal-B gene expression; clinical variables; stratification; mRMR; clustering; luminal-A; luminal-B

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material


Share & Cite This Article

MDPI and ACS Style

He, Z.; Zhang, J.; Yuan, X.; Xi, J.; Liu, Z.; Zhang, Y. Stratification of Breast Cancer by Integrating Gene Expression Data and Clinical Variables. Molecules 2019, 24, 631.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top