Molecular Insights into the Classification of Luminal Breast Cancers: The Genomic Heterogeneity of Progesterone-Negative Tumors

Estrogen receptor (ER)-positive progesterone receptor (PR)-negative breast cancers are infrequent but clinically challenging. Despite the volume of genomic data available on these tumors, their biology remains poorly understood. Here, we aimed to identify clinically relevant subclasses of ER+/PR− breast cancers based on their mutational landscape. The Cancer Genomics Data Server was interrogated for mutational and clinical data of all ER+ breast cancers with information on PR status from The Cancer Genome Atlas (TCGA), Memorial Sloan Kettering (MSK), and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) projects. Clustering analysis was performed using gplots, ggplot2, and ComplexHeatmap packages. Comparisons between groups were performed using the Student’s t-test and the test of Equal or Given Proportions. Survival curves were built according to the Kaplan–Meier method; differences in survival were assessed with the log-rank test. A total of 3570 ER+ breast cancers (PR− n = 959, 27%; PR+ n = 2611, 73%) were analyzed. Mutations in well-known cancer genes such as TP53, GATA3, CDH1, HER2, CDH1, and BRAF were private to or enriched for in PR− tumors. Mutual exclusivity analysis revealed the presence of four molecular clusters with significantly different prognosis on the basis of PIK3CA and TP53 status. ER+/PR− breast cancers are genetically heterogeneous and encompass a variety of distinct entities in terms of prognostic and predictive information.


Introduction
Estrogen receptor (ER)-positive progesterone receptor (PR)-negative (ER+/PR−) breast cancers are a subset of Luminal B tumors characterized by the strong and diffuse nuclear expression of ER-alpha but not of PR [1]. They account for 5% of all invasive breast cancers and show a relatively aggressive clinical course compared to ER+/PR+ neoplasms [1][2][3][4][5]. ER+/PR− invasive breast cancers are described as larger in size than PR+ carcinomas and are generally of no special histological type (i.e., ductal) [1,6]. Even though they preferentially affect postmenopausal women, these diagnoses are not exceptional in younger patients [1,2,7]. As confirmed by several prospectively randomized controlled neoadjuvant trials, ER+/PR− breast cancers are associated with a higher response but also worse long-term outcome after neoadjuvant therapy [5]. There are several lines of evidence to suggest that the worse prognosis of ER+/PR− tumors may be related to the phenomena of hormone therapy resistance [1][2][3][4][5]. However, a large adjuvant trial on the use of aromatase inhibitors in postmenopausal women with early breast cancer revealed that the PR status has no effect on the relative efficacy of this therapy [8]. For this reason, some authors have questioned the clinical utility of PR testing [9]. To date, hormonal therapy remains recommended in ER+ tumors regardless of PR status [10]. All these diverse correlations highlight the clinical challenges provided by ER+/PR− breast cancers.
A proportion of ER+/PR− neoplasms shows a remarkable degree of genomic instability, reaching almost twice the DNA copy number variations and tumor mutational load than those of both ER+/PR+ and ER− breast cancers [1,8]. Furthermore, many growth factors were observed to be overexpressed in these tumors, such as HER family, PI3K, Akt, and src [1,2,[11][12][13]. These pathways, which can also be altered in ER+/PR+ tumors, are known to be involved in ER phosphorylation, which may lead to ligand-independent activation [14]. There is also evidence that the upregulation of Akt and HER1/2 is implicated in tamoxifen resistance [1,2,11,12,[15][16][17][18]. Recently, PR has been proposed as a surrogate biomarker of altered growth factor signaling [5]. Due to these insights, and the substantial lack of distinct biological properties identified to date in ER+/PR− breast cancers, it is becoming increasingly clear that these tumors are clinically and biologically heterogeneous [19][20][21][22][23][24][25].
During the past few years, the Cancer Genome Atlas (TCGA) project has exposed the complexity of the genome-wide genetic alterations in breast cancer [26]. On the other hand, the proper clinical management of Luminal (i.e., ER+) breast cancers, particularly in intermediate-risk patients, remains a matter of controversy. However, there is a limited understanding of how the mutational landscape of these tumors, according to the PR status, can be exploited in the clinic to allow for more tailored management schemes. In this study, we sought: (i) to characterize the mutational signatures of ER+/PR− breast cancers; (ii) to compare the molecular landscapes of PR− and PR+ Luminal tumors; and (iii) to define the prognostic value of the type and pattern of somatic genetic alterations in these patients.

Results
A total of 3589 ER+ breast cancers from the publicly available datasets TCGA, Memorial Sloan Kettering (MSK), and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) were identified. Among them, 3570 (99.5%) cases (2815 invasive ductal carcinomas and 755 invasive carcinomas of any special type) had information on PR status (PR− n = 959, 27%; PR+ n = 2611, 73%) and were included in the current study. The median age at diagnosis of PR− tumors was 59 years old (range 24-92); for PR+ tumors, it was 57 years old (range 23-91). Taken together, 53,585 mutations targeting 13,402 genes were identified, including 57,448 (99%), 6642 (90%), and 8905 (89%) mutations that were private to only one sample in the TCGA, MSK, and METABRIC cohorts, respectively. The number of samples, mutated genes, and mutations of the tumors included in the analysis are summarized in Table 1 and Table S1.

The Molecular Landscape of ER+/PR− Breast Cancers
The average number of mutations displayed by ER+/PR− breast cancers was 16 per sample, whereas in PR+ tumors was 14. The two groups shared 5668 mutated genes, while approximately 1319 (19%) genes were found to be privately altered in ER+/PR− breast cancers. Overall, the mutations in PR− tumors were missense in 12,583 (78%), nonsense in 1250 (8%), frameshift deletions in 896 (5%), frameshift insertions in 616 (4%), splicing in 516 (3%), and in-frame indels in 261 (2%) cases. Of note, fusion genes were detected in 69 ER+/PR− tumors. The mutational landscape and selected clinicopathologic features in ER+/PR− and ER+/PR+ breast cancers are depicted in Figure 1 and Figure S1, respectively. 1319 (19%) genes were found to be privately altered in ER+/PR− breast cancers. Overall, the mutations in PR− tumors were missense in 12,583 (78%), nonsense in 1250 (8%), frameshift deletions in 896 (5%), frameshift insertions in 616 (4%), splicing in 516 (3%), and in-frame indels in 261 (2%) cases. Of note, fusion genes were detected in 69 ER+/PR− tumors. The mutational landscape and selected clinicopathologic features in ER+/PR− and ER+/PR+ breast cancers are depicted in Figure 1 and Figure  S1, respectively. The most frequently mutated gene in PR− tumors was phosphatidylinositol-4,5-bisphosphate 3kinase, catalytic subunit alpha (PIK3CA), with lower prevalence than in PR+ tumors (n = 354, 37% vs. n = 1220, 47%; p < 0.01). In particular, the vast majority of PIK3CA mutations were missense and affected four hotspot regions of the gene, namely N345K, E542K, E545K, and H1047R ( Figure 2). Notably, the H1047R and E545K mutations in PIK3CA were less frequent in PR− tumors ( Table 2). The prevalence of samples showing mutations in TP53, which was the second most frequently mutated gene in both PR− and PR+ Luminal tumors, was higher in PR− breast cancers (n = 312, 33% vs. n = 496, 19%; p < 0.01). Furthermore, the nonsense mutation R342X and the missense mutations P728S, I195T, and H179R in TP53 were enriched in PR− tumors (p < 0.05), as shown in Table 2. Taken together, PIK3CA and TP53 status allowed for the definition of four molecular clusters ( Figure 1). Specifically, Cluster 1 included all PIK3CA-mutant/TP53-mutant samples (n = 108, 11%), Cluster 2 all PIK3CA-mutant/TP53 wild-type samples (n = 246, 26%), Cluster 3 PIK3CA wild-type/TP53-mutant tumors (n = 204, 21%), and Cluster 4 encompassed all PIK3CA/TP53 wild-type cases (n = 401, 42%). The most frequently mutated gene in PR− tumors was phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), with lower prevalence than in PR+ tumors (n = 354, 37% vs. n = 1220, 47%; p < 0.01). In particular, the vast majority of PIK3CA mutations were missense and affected four hotspot regions of the gene, namely N345K, E542K, E545K, and H1047R ( Figure 2). Notably, the H1047R and E545K mutations in PIK3CA were less frequent in PR− tumors ( Table 2). The prevalence of samples showing mutations in TP53, which was the second most frequently mutated gene in both PR− and PR+ Luminal tumors, was higher in PR− breast cancers (n = 312, 33% vs. n = 496, 19%; p < 0.01). Furthermore, the nonsense mutation R342X and the missense mutations P728S, I195T, and H179R in TP53 were enriched in PR− tumors (p < 0.05), as shown in Table 2. Taken together, PIK3CA and TP53 status allowed for the definition of four molecular clusters ( Figure 1). Specifically, Cluster 1 included all PIK3CA-mutant/TP53-mutant samples (n = 108, 11%), Cluster 2 all PIK3CA-mutant/TP53 wild-type samples (n = 246, 26%), Cluster 3 PIK3CA wild-type/TP53-mutant tumors (n = 204, 21%), and Cluster 4 encompassed all PIK3CA/TP53 wild-type cases (n = 401, 42%). Among the other recurrent gene alterations, the hotspot mutation E17K in RAC-alpha serine/threonine-protein kinase (AKT1), which was present in 3% and 5% of PR− and PR+ cases, respectively, was mutually exclusive with mutations targeting PIK3CA, regardless of PR status ( Figure S2). On the other hand, even if PIK3CA and AKT1 were observed to be recurrently mutated in both groups, the hotspot regions differed significantly on the basis of PR activation (p < 0.05). Of note, GATA3 showed a high number of frame-shift indels and nonsense mutations (Figure 2), consistent with its crucial role in the ER signaling pathway. One of the most recurrently mutated genes was E-cadherin (CDH1), with the hotspot truncating mutation in position 23 associated to the lobular histology ( Figure 2). The prevalence of human epidermal growth factor receptor (HER)2-mutant cases was higher in PR− breast cancers, albeit nonsignificant (n = 151, 16% vs. n = 389, 15%; p = 0.530). According to the Student's t-test, the mutational profile of PR− Luminal breast cancers was significantly different to that of PR+ tumors (p < 10 −5 ), with 16 mutations being restricted to the ER+/PR− group, including mutations in ARID1A, ATR, BCL6, BRAF, CARD11, CDH1, AXIN2, GATA3, MUC16, CCDC82, RUNX1, and TBX3 (Table 2). No significant correlations were observed between PR activation status and other clinicopathologic characteristics. The tumor mutational burden (median of five mutations per sample for both PR+/−; mean 15.2 per sample for PR+; mean 15.9 per sample for PR−; range 1-3474 in PR+; and range 1-2900 in PR−) of the cases included in this study is shown in Figure S3. Among the other recurrent gene alterations, the hotspot mutation E17K in RAC-alpha serine/threonine-protein kinase (AKT1), which was present in 3% and 5% of PR− and PR+ cases, respectively, was mutually exclusive with mutations targeting PIK3CA, regardless of PR status ( Figure S2). On the other hand, even if PIK3CA and AKT1 were observed to be recurrently mutated in both groups, the hotspot regions differed significantly on the basis of PR activation (p < 0.05). Of note, GATA3 showed a high number of frame-shift indels and nonsense mutations (Figure 2), consistent with its crucial role in the ER signaling pathway. One of the most recurrently mutated genes was E-cadherin (CDH1), with the hotspot truncating mutation in position 23 associated to the lobular histology ( Figure 2 Figure S3.   Overall, the highest mortality was observed before 50 months from the diagnosis, regardless of PR status, with a median survival of 76.9 months in PR− and 61 months in PR+ tumors. The most recurrently mutated genes in ER+/PR− and ER+/PR+ breast cancers were used to define the survival probability curves shown in Figures S4 and S5, respectively. Even though the log-rank p-values were significant for TP53 and GATA3 mutations in both groups, survival analyses including tumors harboring alterations only in each of the most frequently mutated genes, but not in the others, revealed that in ER+/PR− breast cancers only TP53 mutations are related to a different prognosis (Figure 3). The hotspot regions of TP53 that were significantly different in PR− tumors were not related to a different outcome ( Figure S6), similar to PIK3CA ( Figure S7).

Discussion
The precise risk stratification in Luminal breast cancer by means of immunohistochemistry and/or prognostic genomic tests is a major limitation in defining the most appropriate management scheme [27]. Patients with ER+ breast cancers are assumed to have a good prognosis, but the lack of PR expression may contribute to their poor outcomes. This may be a result of the de-differentiation of hormone-positive neoplasms and subsequent development of resistance phenomena to both anti-estrogen therapy and chemotherapy. Studies aiming to explore the genetic alterations in ER+/PR− breast cancers have been performed. However, the unique biology and challenging clinical course of these tumors, particularly in long-term survivors, suggest that they warrant further characterization. In this study, we analyzed a large cohort of PR− Luminal breast cancers with publicly available genomic data and compared their molecular landscape and prognosis to those of PR+ tumors. Altogether, we observed that several alterations in clinically actionable cancer genes are private to or enriched for in PR− breast cancers, such as TP53 R342X, P728S, I195T, and H179R, GATA3, CDH1, HER2, CDH1, and BRAF V600E. Furthermore, we identified four molecular clusters on the basis of PIK3CA and TP53 status with significantly different risk of death in PR− tumors.
Decreased expression and/or downregulation of PR in breast cancer leads to a subset of tumors that is phenotypically ER+/PR−. Even though several hypotheses to explain this phenomenon have been put forward, we are still far from fully understanding its biology. In a proportion of Luminal tumors, ER, although expressed, is biologically nonfunctional and therefore it is unable to stimulate PR production, particularly in postmenopausal women [1]. Another mechanism for PR loss is the epigenetic inactivation of its promoter through hypermethylation [12]. Even though a genetic loss of a PR gene locus has previously been observed [12], in our analysis, all ER+/PR− tumors are PR wild-type, suggesting that PR downregulation may be determined by growth factor pathways, as previously observed [2,11]. In particular, the HER2 activity may lead to the cytoplasmic sequestration of ER, which alters a set of genes that are normally regulated by ER, including PR−related genes, such as PIK3CA [11,28,29].
Taken together, we observed that the most frequently mutated genes in ER+/PR− breast cancers are PIK3CA, TP53, GATA3, CHD1, KMT2C, MUC16, MAP3K1, ARID1A, AHNAK2, and SYNE2. Interestingly, PIK3CA and TP53 show a mutational prevalence (37% and 33%, respectively) that differs significantly to that of ER+/PR+ tumors (with PIK3CA mutated in 47% of cases and TP53 in 19%). Those aspects have already been described in the literature [30,31]. On the other hand, the identification of a mutational profile specific to ER+/PR− cases, with 16 mutations being restricted to this group, is a novel finding. In our study, we confirm the presence of highly recurrent molecular alterations of the PIK3CA gene in position 1047, which likely constitute the driving genetic event in the pathogenesis of a subset of ER+/PR− breast cancers. These data provide further credence to the notion that inhibitors of this pathway (e.g., XL147) could reverse PR downregulation and overcome resistance to anti-HER2 drugs [32]. In addition, the identification of the BRAF V600E as a private mutation of PR− cases have possible therapeutic implications [33,34]. Recently, mutations in HER2 have been detected in breast cancer patient samples which lack HER2 gene amplification. Thirteen HER2 mutations were characterized from twenty-five patient samples which had HER2 mutations but lacked HER2 gene amplification. Among them, seven mutations were activating and resulted from point mutations and in-frame deletions. Some mutations (L755S) resulted in lapatinib resistance; however, this was not an activating mutation. All of the cells containing the HER2 mutations were sensitive to the irreversible HER2 kinase inhibitor, neratinib [35]. Our analysis corroborates the concept that mutations in GATA3 are associated with a better outcome in ER+ breast cancer patients [36]. After eliminating all cases with concurrent mutations in the other top recurrently mutated genes, however, we were able to confirm this notion only in PR+ tumors. These data suggest that GATA3 mutations are not an independent good prognostic factor in ER+/PR− tumors. Given that GATA3 is frequently altered in Luminal A breast cancers, our findings provide an additional molecular layer to the worse prognosis showed by ER+/PR− breast cancers [19,37]. Furthermore, we confirmed that TP53 mutations are associated with PR negativity and with a shorter overall survival time in breast cancers [38]. Interestingly, this behavior is unrelated to the specific regions of TP53 that are recurrently altered in this subset of patients, akin to patients with PIK3CA-mutant tumors.
The patterns of mutations in TP53 with PIK3CA allowed us to identify four molecular clusters in both PR− and PR+ Luminal breast cancers, namely PIK3CA/TP53-mutant (Cluster 1), PIK3CA-mutant/TP53 wild-type (Cluster 2), PIK3CA wild-type/TP53-mutant (Cluster 3), and PIK3CA/TP53 wild-type (Cluster 4). Notably, the prognostic distribution of these clusters differed substantially between ER+/PR− and ER+/PR+ breast cancers. Indeed, while in PR+ Luminal tumors Clusters 2 and 4 were related to better survival, with overlapping curves, in PR− Luminal tumors Cluster 2 followed into in an intermediate risk category for the first 16 years of follow-up, becoming worse after that time. All these diverse correlations highlight the importance of PIK3CA and TP53 analysis in PR− Luminal breast cancer prognostication.

Case Selection and Definitions
We used the CGDS R package to interrogate the Cancer Genomics Data Server [39,40] and download mutational and clinical data related to three breast cancer projects hosted at the Memorial-Sloan-Kettering Cancer Center: the METABRIC project [41], containing 2509 breast cancers samples; the MSK project [42] containing 1918 samples; and the TCGA project, containing 1105 samples. Each sample has both somatic mutational profiles for selected genes, and clinical information. In particular, the TCGA project contains mutational profiles for 20,461 genes, the METABTIC project contains mutational profiles for 173 genes and the MSK project for 474 genes. Moreover, we used gplots and ggplot2 packages [43,44] to perform the clustering analysis and visualize the data. We collected all somatic mutations related to the projects and integrated them to the clinical information and the treatment outcomes. Moreover, we selected all the estrogen receptor positive (ER+) samples reducing our dataset to 3589 samples, and a total of 53,585 somatic mutations in 13,402 genes.

Statistical Analysis
Comparisons between groups were generally performed using the Student's t-test and test of Equal or Given Proportions. Event-free survival was expressed as the number of months from diagnosis to the occurrence of distant or local relapse or death (disease-related death). Cumulative survival probabilities were calculated using the Kaplan-Meier method. Differences between survival rates were tested with the log-rank test (SPSS version 20.0; IBM). Survival data were censored at five years. A p < 0.05 was considered statistically significant. Survival analysis and figures were developed using the R survival and survminer packages [45], and the Kaplan-Meier non-parametric statistic [46].

Conclusions
We demonstrated that ER+/PR− breast cancers are biologically characterized by relevant molecular characteristics in terms of prognostic and predictive information, which could be integrated into the clinical setting to realize the potentials of precision medicine in these clinically, and pathologically, challenging neoplasms.
Supplementary Materials: Supplementary materials can be found at http://www.mdpi.com/1422-0067/20/3/ 510/s1. Figure S1: Oncoprint visualization of highly recurrent somatic molecular alterations in ER+/PR+ breast cancers (2611 samples). Each row represents a gene, as reported on the right and was sorted by gene alterations frequency (bar plot on the right); types of alterations are color-coded on the basis of the legend on the bottom. Each column represents a sample and was sorted to appreciate the mutual exclusivity across genes. The bar plot on the top represents the number of samples showing alterations in the displayed genes. Cluster analysis, HER2 status, histological type, tumor stage, and menopause status are reported as rows at the bottom of the figure; age at diagnosis is depicted in the top at the bottom. Clustering was performed according to the mutual exclusivity and patterns of mutations. Figure S2: Recurrent somatic alterations in 959 ER+/PR− (A) and in 2611 ER+/PR+ (B) breast cancers (2611 samples). Each row represents an alteration, as reported on the right, each column a sample. Alterations were sorted by frequency, while the samples were sorted to appreciate the mutual exclusivity across alterations. Figures report the 50 most frequent gene alterations. Figure S3: Total number of mutations per samples in ER+/PR− (A) and ER+/PR+ (B) breast cancer patients. Each bar represents a sample; types of alterations are color-coded on the basis of the legend on the left. Figure S4: Overall survival of ER+/PR− breast cancer patients based on the most frequently altered genes. Survival curves (red, mutant; gray, wild-type) are built according to the Kaplan-Meier method. For each analysis, all samples harboring mutations in one of the other nine genes were excluded. Figure S5: Overall survival of ER+/PR+ breast cancer patients based on the most frequently altered genes. For each analysis, all samples harboring mutations in one of the other nine genes were excluded. Survival curves (red, mutant; gray, wild-type) are built according to the Kaplan-Meier method. Figure S6: Overall survival of ER+/PR− breast cancer patients based on the most frequently altered regions in the PIK3CA gene. Survival curves are built according to the Kaplan-Meier method. Figure S7: Overall survival of ER+/PR− breast cancer patients based on the most frequently altered regions in the TP53 gene. Survival curves are built according to the Kaplan-Meier method. Table S1: Mutations of the tumors included in the analysis.