We hypothesized the candidate targets predicted for luminal A can be extrapolated to other breast cancer subtypes by integrating multiple filters. We then performed multifaceted assessment to analyze the biological and pathological importance of 510 candidate genes for each breast cancer subtype, from the perspectives of dysregulation in transcriptome, influence of transcriptional dysregulation on patient survival, influence of genomic abnormal methylation on gene expression, critical roles in other cancer pathogenesis, and also as novel drug targets or for repositioning purposes. In this way, we further prioritized the 510 candidate genes to screen out the targets with most potential for experimental validation.
3.3.1. Dysregulation in Transcriptome and Determinant Roles in Survival
We first evaluate the expression levels of 510 genes in transcriptomes of breast tumor samples from each breast cancer subtype. Genes with logFC above 1 and FDR value less than 0.05 were considered significantly up-regulated. We illustrate the distribution of 510 genes according to their expression levels in transcriptomes and the ratio of distribution in each subtype (
Supplementary Materials Figure S6). Notably, in the transcriptomes of tumor samples of luminal A, luminal B, HER2+ and TNBC, there were only 37, 50, 71, and 58 significantly up-regulated, and 59, 94, 85, and 74 significantly down-regulated candidate genes, respectively (
Supplementary Materials Table S7). Consistently, the majority of candidate genes (69% to 81%) fall into the intermedia expression range regardless of the cancer subtype.
Since our evaluation was based on genome-wide RNAi transcriptomes, it is of great importance to unveil the impact of cancer transcriptomes on clinical prognosis. Overall survival (OS) and recurrence free survival (RFS) measured in days were included in TCGA for 800 breast cancer samples. Thus, we performed survival analysis to further explore the relationship of gene expression level of 510 candidate genes with survival time of breast cancer patients. Since gene expression is continuous, we applied three cut-off strategies to categorize the breast tumor samples into “high” and “low” expression group and used Kaplan-Meier method to analyze the censored data respectively. Genes with
p value less than 0.05 and hazard ration (HR) larger than one were kept, resulting 44 genes using median cut-off strategy, 28 genes using tertile cut-off strategy and 52 genes using quartile cut-off strategy (
Supplementary Materials Table S8).
Therefrom, we identified a total of 70 genes whose dysregulation in transcriptome are strongly associated with poor prognosis, especially low overall survival for breast cancer patients of all subtypes. Detailed information is provided in
Supplementary Materials Table S8, and example Kaplan-Meier curves of gene MCU1 and TIAM1 are provided in
Supplementary Materials Figure S7 to represent the distinctive “high” and “low” group. As shown in
Supplementary Materials Figure S6 (red dots), we marked the expression level of these 70 genes for each breast cancer subtype (
Supplementary Materials Table S8) and further evaluated the enrichment of these genes with high transcriptome reverse potential, i.e., absolute connectivity score above 0.8. There were only 15 poor-survival related genes with high absolute connectivity score (out of the 79 high scoring genes, hypergeometric test,
p = 0.10). It is noteworthy that about half of the 70 poor-survival related genes were down-regulated in transcriptome (logFC < 0), which is contrary to the assumptions that patient with higher gene expression has poor prognosis. Thus, we only focused on those genes in each breast cancer subtype whose up-regulation in transcriptome are highly associated with short survival time. As a result, we found a total of 46 such genes, including 39 for luminal A, 29 for luminal B, 38 for HER2+ and 27 for TNBC, respectively. Among them, TRAPPC3, DHRS7, HYAL2, SRRT, PQBP1 and ADORA2A were of high absolute connectivity scores, and notably the latter three were valid for all subtypes (
Supplementary Materials Figure S6).
3.3.2. Pathogenic Importance in Cancer and Methylation at Genomic Level
Cancer Gene Census list, which is a manually curated set of genes that have mutations or other genomic abnormalities associated with cancer and are likely to be causative, as identified from genetic studies [
38]. This 602-gene data set exemplifies large gene lists that have been generated from initial experimental validation. Therefrom, we found 40 out of the 510 candidate genes playing roles in specific cancers, and among them 5 genes, i.e., ALDH2, BTG1, TNFRSF14, MUC1 and STAT6, whose high expression in cancer transcriptome are associated with poor survival. Using annotations from Cancer Gene Census, 34 genes were related to breast cancer, and 4 genes, i.e., PBRM1, TP53, AKT1, CDKN1B, were in our candidate gene list.
DNA methylation of tumor suppressor genes has been the focus of numerous studies that have aimed to identify DNA methylation biomarkers of cancer [
53]. Meanwhile, it is becoming clear that hypomethylation is equally important a driving force in breast cancer metastasis [
54,
55]. Thus, we further investigated DNA methylation status of the 23,423 genes in tumor samples of each breast cancer subtype, with the aim to identify genes that undergo significantly differential methylation at cancer state. Using TCGA methylation data generated from Illumina HumanMethylation450 BeadChip platform, we performed differential methylation analysis for 288 luminal A, 127 luminal B, 31 HER2+, and 59 TNBC tumor samples as well as 87 normal breast tissue samples. We observed a bimodal distribution of the calculated beta values, with two peaks around 0.1 and 0.9 and a relatively flat valley around 0.2–0.8 (
Supplementary Materials Figure S8). In differential methylation analysis, Beta-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels [
56]. As recommended, we used M-value to calculate differential methylation positions (DMPs) with limma. The number of normal samples was 33 for luminal A, 18 for luminal B, 6 for HER2+ and 5 for TNBC, respectively. Probes with FDR value less than 0.05 and an average Beta-value difference of 0.2 between cancer and normal samples were considered as differentially methylated probes (
Supplementary Materials Table S9).
As a result, the number of significantly methylated probes (genes) was 18,747 (6909) for luminal A, 32,558 (9314) for luminal B, 20,494 (7197) for HER2+ and 6469 (3172) for TNBC. After annotation, we identified from the 510 preliminary candidate genes a set of hypermethylated and hypomethylated genes for each subtype respectively, i.e., 104 and 70 for luminal A, 128 and 111 for luminal B, 96 and 73 for HER2+, 30 and 52 for TNBC. We then focused on the hypomethylated genes that were significantly up-regulated at cancer state and calculated the enrichment of these genes with high transcriptome reverse potential (i.e., absolute connectivity score above 0.8) and/or associated with poor survival. The number of hypomethylated genes that were significantly up-regulated at cancer state was 35 for luminal A, 48 for luminal B, 41 for HER2+, 34 for TNBC. Out of the 79 genes with high transcriptome reverse potential, the number of genes that satisfy the two conditions was 4 for luminal A (TAL2, OPN3, IL20, ARID3A, p = 0.82), 5 for luminal B (TAL2, UMODL1, OPN3, IL20, ARID3A, p = 0.90), 5 for HER2+ (TAL2, OPN3, IL20, CPT1A, ARID3A, hypergeometric test, p = 0.79) and 4 for TNBC (UMODL1, OPN3, LDHB, ARID3A, hypergeometric test, p = 0.80). Out of the 70 genes associated with poor survival, the number of genes that satisfy the two conditions was 5 for luminal A (MUC1, HLA-DRA, WNT7B, XBP1, EFCAB2, hypergeometric test, p = 0.54), 4 for luminal B (ATG16L2, MUC1, WNT7B, XBP1, p = 0.92), 5 for HER2+ (C1QTNF6, NDUFS6, MUC1, HLA-DRA, XBP1, hypergeometric test, p = 0.69) and 4 for TNBC (CHERP, MUC1, TIAM1, GABRP, hypergeometric test, p = 0.71).
3.3.4. Prioritized Therapeutic Targets for Four Breast Cancer Subtypes
Finally, combining all of the above assessment, we obtained a small list of 11 genes, i.e., MUC1, HLA-DRA, WNT7B, XBP1, EFCAB2, ATG16L2, C1QTNF6, NDUFS6, CHERP, TIAM1 and GABRP, as the most potential targets for four breast cancer subtypes (
Table 1). The number of final candidate targets for each subtype is 5 for luminal A, 4 for luminal B, 5 for HER2+ and 4 for TNBC (
Figure 4,
Supplementary Materials Table S11). These genes show averagely high transcriptome reverse potential (absolute connectivity score > 0.7) when knocked down by RNAi reagents, indicating feasibility as targets for developing RNAi therapeutics. Meanwhile, these genes were verified by genome and transcriptome data from tumors samples of clinical breast cancer patients as significantly hypo-methylated at genomic level and transcriptionally up-regulated, despite their distinct dysregulation levels in each subtype. Notably, we found MUC1 a commonly valid target for all four subtypes (
Supplementary Materials Figure S9), XBP1 for four subtypes except for TNBC, WNT7B for the two luminal subtypes, and HLA-DRA for luminal A and HER2+. Most importantly, there are several genes that can distinguish the four breast cancer subtype as specific therapeutic targets for each subtype, including EFCAB2 for luminal A, ATG16L2 for luminal B, C1QTNF6 and NDUFS6 for HER2+, as well as CHERP, TIAM1, and GABRP for TNBC (
Table 1). Among them, only GABRP has known drug targeting for the treatment of diseases such as insomnia and epilepsy, and this leaves potential for drug repurposing.
Obviously, most of the common targets are highly related to breast cancer, and their pathogenic importance in breast cancer has been experimentally validated. For example, MUC1 encodes glycoprotein Mucin 1 with extensive O-linked glycosylation of its extracellular domain and overexpression of MUC1 is often associated with colon, breast, ovarian, lung and pancreatic cancers [
57]. It is a multifaceted oncoprotein which promotes growth, metastasis, and resistance to drugs in cancer [
58]. Immune responses to MUC1 have been seen in breast and ovarian cancer patients and clinical studies have been initiated to evaluate the use of antibodies to MUC1 and of immunogens based on MUC1 for immunotherapy of breast cancer patients [
59,
60]. XBP1 functions as a transcription factor during endoplasmic reticulum (ER) stress by regulating the unfolded protein response (UPR). XBP1 is activated in TNBC and has a pivotal role in the tumorigenicity and progression of this human breast cancer subtype by controlling HIF1α pathway. In breast cancer cell line models, depletion of XBP1 inhibited tumor growth and tumor relapse [
61]. WNT7B encoded Wnt7b is a Wnt ligand that has been demonstrated to play critical roles in several developmental processes. Myeloid WNT7b mediates the angiogenic switch and metastasis in breast cancer, and therapeutic suppression of WNT7B signaling might be advantageous due to targeting multiple aspects of tumor progression [
62]. HLA-DRA encodes HLA class II histocompatibility antigen, DR alpha chain. It is part of the HLA class II molecule which is expressed in antigen presenting cells (APC) and plays a central role in the immune system by presenting peptides derived from extracellular proteins. This protein is generally invariable, yet research showed HLA-DRA was highly overexpressed in ovarian cancer, perhaps as a result of inflammatory events in the tumor microenvironment. The tumor cells may have compensatory mechanisms to reduce the production of functional MHC class II molecules, thus reducing immunogenicity and favoring tumor growth [
63].
With regard to the specific targets predicted to TNBC, TIAM1 encodes Tiam1 (T-lymphoma invasion and metastasis 1), one of the known guanine nucleotide (GDP/GTP) exchange factors (GEFs) for Rho GTPases (e.g., Rac1) and is expressed in breast tumor cells (e.g., SP-1 cell line). Research showed ankyrin-Tiam1 interaction plays a pivotal role in regulating Rac1 signaling and cytoskeleton function required for oncogenic signaling and metastatic breast tumor cell progression [
64]. GABRP encodes Gamma-aminobutyric acid receptor subunit pi, which is a component of a cell-surface receptor. Research showed GABRP stimulates basal-like breast cancer cell/ triple negative subtype migration through activation of extracellular regulated kinase 1/2 (ERK1/2). In addition, silencing GABRP in BLBC cells decreases migration, BLBC-associated cytokeratins and ERK1/2 activation [
65].