Prognostic Capability of TNBC 3-Gene Score among Triple-Negative Breast Cancer Subtypes

Simple Summary In this study we evaluated the prognostic capability of the 3-gene score in the molecular subtypes of triple negative breast cancer and found that the score was able to predict the risk of distant recurrence in the immunomodulatory and mesenchymal stem-like subtypes. Additionally, a low 3-gene score was related to a high level of tumor-infiltrating lymphocytes. Our findings suggest that the prognostic capability of the 3-gene score is associated to tumor-infiltrating components. Abstract Background: Triple-negative breast cancer (TNBC) is a complex and molecularly heterogeneous entity, with the poorest outcome compared with other breast cancer subtypes. Previously, we developed a TNBC 3-gene score with a significant prognostic capability. This study aims to test the 3-gene score in the different TNBC subtypes. Methods: Data from 204 TNBC patients treated with neoadjuvant chemotherapy were retrieved from public datasets and pooled (GSE25066, GSE58812, and GSE16446). After removing batch effects, cases were classified into Lehman’s TNBC subtypes and then the TNBC 3-gene score was used to evaluate the risk of distant recurrence in each subgroup. In addition, the association with tumor-infiltrating lymphocyte (TILs) levels was evaluated in a retrospective group of 72 TNBC cases. Results: The TNBC 3-gene score was able to discriminate patients with different risks within the pooled cohort (HR = 2.41 for high vs. low risk; 95%CI: 1.50–3.86). The score showed predictive capability in the immunomodulatory subtype (HR = 4.16; 95%CI: 1.63–10.60) and in the mesenchymal stem-like subtype (HR = 18.76; 95%CI: 1.68–208.97). In the basal-like 1, basal-like-2, and mesenchymal subtypes, the observed differential risk patterns showed no statistical significance. The score had poor predictive capability in the luminal androgen receptor subtype (p = 0.765). In addition, a low TNBC 3-gene score was related to a high level of TIL infiltration (p < 0.001). Conclusions: The TNBC 3-gene score is able to predict the risk of distant recurrence in TNBC patients, specifically in the immunomodulatory and mesenchymal stem-like subtype. Despite a small sample size in each subgroup, an improved prognostic capability was seen in TNBC subtypes with tumor-infiltrating components.


Introduction
Triple-negative breast cancer (TNBC) is a term coined to define a group of breast cancers lacking the expression of an estrogen receptor (ER), a progesterone receptor (PR), and a human epidermal growth factor receptor 2 (HER2) [1]. From a pathological viewpoint, TNBC cases are characterized for being more aggressive than other subtypes due to their high histological grade and presence of compromised lymph nodes at the time of diagnosis [2]. Furthermore, TNBC represents 10-20% of all breast cancers, with a higher prevalence in young and pre-menopausal patients than in older patients [1,3,4]. In addition, African American and Hispanic patients show a higher prevalence of TNBC in contrast to Caucasian and Asian women [5].
At present, neoadjuvant chemotherapy (NAC) is the most effective treatment and standard of care for non-metastatic TNBC, with high rates of clinical and pathologic response and an improved outcome among responders [6,7]. Nevertheless, not all patients show the same responses or survival rates, which suggests that TNBC is molecularly heterogeneous [2,8,9].
In 2011, Lehmann et al., classified the TNBC into six molecular subtypes. These subtypes include the basal-like 1 (BL1), characterized by its rapid cell division, high proliferation rate seen as Ki67 greater than 70%, lack of cell cycle control, and high DNA damage response, especially in the ATR/BRCA gene pathways; however, this subtype showed to have the best prognosis. The basal-like 2 (BL2), with altered growth factor signaling, activation of glycolysis and gluconeogenesis routes, and high expression of growth factor receptors; the immunomodulatory (IM), with a strong molecular signature of immune cell processes such as high T cell, B cell, chemokine and NF-kappa B signaling pathways; the mesenchymal (M) and mesenchymal stem-like (MSL), while both present a high expression of genes involved in cell motility, cellular differentiation and growth pathways, the MSL expresses a different group of growth (platelet-derived growth factor, epidermal growth factor receptor, G-protein coupled receptor signaling) and angiogenic factors (vascular endothelial growth factor 2, tyrosine kinase with immunoglobulin-like EGF-like domains 1), low levels of proliferation genes as well as claudins, and high levels of stem cell factors; the M subtype encodes pathways involved in the cell cycle, mismatch repair, DNA damage, osteocyte and adipocyte genes; the luminal androgen receptor (LAR), characterized by the expression of androgen receptors, low proliferation, elevated steroid hormone synthesis, and high androgen and estrogen metabolism, despite being ER negative, this subtype is also the most chemo resistant but has a favorable prognosis [10][11][12][13][14][15].
Previously, we developed a linear predictor for distant recurrence-free survival (DRFS) based on the expression of three genes (CCL5, DDIT4, and POLR1C) [16] by conducting an analysis of the expression levels of 449 genes related to TNBC aggressiveness. In addition, we reported that a high DDIT4 expression was related to a poor outcome in different types of cancer, the dysregulation of POLR1C gene expression is involved in tumor aggressiveness in breast cancer, while CCL5, typically associated with a poor outcome, in TNBC is associated with a major concentration of tumor-infiltrating lymphocytes (TILs) and recruitment of CD8 T-cells, CD4 activated T-cells, NK activated cells, and M1 macrophages [17,18]. Furthermore, it has been demonstrated that high levels of TILs are associated with better disease-free survival, overall survival, and response to chemotherapy as well as immunotherapy [19,20].
Due to the molecular heterogeneity of TNBC, our aim was to evaluate the prognostic capability of the TNBC 3-gene scores in the six molecular subtypes of TNBC, which may be useful in the development of a tailored therapeutic approach for TNBC. As a secondary objective, we analyzed the relation between the prognostic signature and TIL infiltration.

Patients
We included TNBC patients treated with neoadjuvant chemotherapy (NAC) to evaluate the prognostic capability of the TNBC 3-gene score. Patients were selected from three public datasets available at Gene expression Omnibus (GEO) (https://www.ncbi.nlm.nih. gov/geo/ accessed on 15 June 2021) [21].
In addition, we included a retrospective cohort of 74 TNBC Peruvian patients with residual disease after NAC, with TIL count information. Since this dataset was profiled with Nanostring and TNBC subtype information was not available, it was not included in the metabase. Gene expression profile and TIL assessment of this cohort have been previously described [18].

Subtype Identification
The online tool TNBCtype (https://cbc.mc.vanderbilt.edu/tnbc/ accessed on 15 June 2021) [25] was used to classify the samples of the public datasets according to the TNBC subtypes. In datasets with more than one probe for the same gene, values were collapsed to the highest level of gene expression.
Samples identified as possible ER positive were removed and the analysis was repeated. In total, 22 samples were excluded from GSE25066, 11 samples from GSE58812, and one sample from GSE16446.

Elaboration of the Metabase
Since the unstable subtype (UNS) was not considered a Lehmann's TNBC molecular subtype, these patients were removed and then the datasets were pooled. Eleven samples were removed from GSE25066, thirteen samples from GSE58812, and five samples from GSE16446.
The remaining samples in the three datasets, GSE25066 (n = 80), GSE58812 (n = 83), GSE16446 (n = 41), were combined into one and transformed to base 2 logarithm (log2) and centered by the median. The online tool COMBAT V3, implemented in Genepattern, was used to eliminate the batch effect in the metabase [26].
To verify that there was no batch effect in the metabase, we used the F-test of the analysis of variance (ANOVA) and a graphical method based on linear discriminant analysis (LDA).

Prognostic Capability of the TNBC 3-Gene Score According to TNBC Subtypes
The TNBC 3-gene score was calculated according to the following formula: −0.393 × CCL5 + 0.443 × DDIT4 + 0.490 × POLR1C, as reported in Pinto et al., (2016) [16]. The median was used as the cutoff to establish groups with a high risk (values higher than the median) or a low risk (values equal or lower than the median) of recurrence.
Distant recurrence-free survival (DRFS) was estimated with the Kaplan-Meier method and Log-rank or Breslow tests were used to compare survival curves. Hazard ratios were estimated by the Cox proportional hazards model, evaluating the risk score as a categorical variable.
The risk score was compared between TNBC subtypes using the ANOVA test and Tukey's multiple comparison test.

Evaluation of the Relation between TNBC 3-Gene Score and TILs
The TNBC 3-gene score was evaluated as continuous variable while TIL count was categorized into high and low, using a cutoff value of 20% since it has been proved as a prognostic biomarker of survival in TNBC [18,27,28]. The boxplot graphic and the Student's t-test were used to analyze the differences between groups.

Gene Expression of CCL5, DDIT4, and POLR1C in the Metabase
After using COMBAT v3 to eliminate the batch effect in the metabase, the expressio of genes CCL5 (p = 0.869), DDIT4 (p = 0.830), and POLR1C (p = 0.991) did not present sig nificant differences between the three databases. Furthermore, the linear discriminan function (LDA) plot did not show grouping in the data ( Figure S1).

Predictive Value of the TNBC 3-Gene Score in the Metabase
The median value of the score risk (0.9863) was used as a cutoff to establish tw groups with different risk of recurrence. A statistically significant difference was ob served, with a 5-year DRFS of 70.7% for the low-risk group and 46.0% for the high-ris group (p < 0.001). The CoxPH analysis showed a HR = 2.41 (95%CI:1.50-3.86; p < 0.001) fo recurrence in the high-risk group (Figure 2).

Gene Expression of CCL5, DDIT4, and POLR1C in the Metabase
After using COMBAT v3 to eliminate the batch effect in the metabase, the expression of genes CCL5 (p = 0.869), DDIT4 (p = 0.830), and POLR1C (p = 0.991) did not present significant differences between the three databases. Furthermore, the linear discriminant function (LDA) plot did not show grouping in the data ( Figure S1).

Predictive Value of the TNBC 3-Gene Score in the Metabase
The median value of the score risk (0.9863) was used as a cutoff to establish two groups with different risk of recurrence. A statistically significant difference was observed, with a 5-year DRFS of 70.7% for the low-risk group and 46.0% for the high-risk group (p < 0.001). The CoxPH analysis showed a HR = 2.41 (95%CI:1.50-3.86; p < 0.001) for recurrence in the high-risk group (Figure 2).

Three-Gene Score Predictive Value in TNBC Subtypes and Relation with TILs
The risk score presented a significant difference in relation to the molecular subtype of the TNBC (p < 0.001) and was lower in patients with IM and MSL subtypes. The risk score of IM was significantly lower than BL1 (p < 0.001), LAR (p = 0.010), and M (p < 0.001), while the risk score of MSL was lower than BL1 (p = 0.002) and M (p < 0.001) (Figure 3).
The TNBC 3-gene score was able to discriminate groups with different risk of recurrence only in the IM and MSL subtypes. The hazard ratios were 4.16 (95%CI: 1.63-10.60; p = 0.003) and 18.76 (95%CI: 1.68-208.97; p = 0.017) for the IM and MSL, respectively. Survival curves and p values are shown in Figure 4.

Three-Gene Score Predictive Value in TNBC Subtypes and Relation with TILs
The risk score presented a significant difference in relation to the molecular subtype of the TNBC (p < 0.001) and was lower in patients with IM and MSL subtypes. The risk score of IM was significantly lower than BL1 (p < 0.001), LAR (p = 0.010), and M (p < 0.001), while the risk score of MSL was lower than BL1 (p = 0.002) and M (p < 0.001) (Figure 3).
The TNBC 3-gene score was able to discriminate groups with different risk of recurrence only in the IM and MSL subtypes. The hazard ratios were 4.16 (95%CI: 1.63-10.60; p = 0.003) and 18.76 (95%CI: 1.68-208.97; p = 0.017) for the IM and MSL, respectively. Survival curves and p values are shown in Figure 4.     A high-risk score was shown in patients with low TILs. Differences between groups were statistically significant (p < 0.001) ( Figure 5).

Figure 5.
Comparison of the Three genes risk score according to TILs group. Patients with low TILs had a higher score. Asterisks represent statistical significance level (*** p < 0.001).

Discussion
In this study, the TNBC 3-gene prognostic signature was able to predict the risk of recurrence in the different subtypes of TNBC, specifically in the immunomodulatory and mesenchymal stem-like subtype. Moreover, among patients without complete pathologic response to NAC, a lower risk score was associated with high levels of tumor infiltrating components.
TNBC is a group of molecularly heterogeneous breast tumors that can be grouped into six different subtypes (BL1, BL2, IM, M, MSL, and LAR) by the expression level of approximately 1500 genes [10]. To date, there are limited options to use targeted therapy, despite advances in the understanding of this disease. TNBC patients with resistance to neoadjuvant chemotherapy have worse prognosis than patients achieving complete response, who show similar outcomes to non-TNBC patients [15]. In contrast, TNBC patients exhibit an increased risk of recurrence up to three years after the surgery, after which the risk of recurrence decreases dramatically [9]. These features describe the unmet need for biomarkers to stratify patients according to their risk and for new molecular targets to develop better therapeutic strategies.

Discussion
In this study, the TNBC 3-gene prognostic signature was able to predict the risk of recurrence in the different subtypes of TNBC, specifically in the immunomodulatory and mesenchymal stem-like subtype. Moreover, among patients without complete pathologic response to NAC, a lower risk score was associated with high levels of tumor infiltrating components.
TNBC is a group of molecularly heterogeneous breast tumors that can be grouped into six different subtypes (BL1, BL2, IM, M, MSL, and LAR) by the expression level of approximately 1500 genes [10]. To date, there are limited options to use targeted therapy, despite advances in the understanding of this disease. TNBC patients with resistance to neoadjuvant chemotherapy have worse prognosis than patients achieving complete response, who show similar outcomes to non-TNBC patients [15]. In contrast, TNBC patients exhibit an increased risk of recurrence up to three years after the surgery, after which the risk of recurrence decreases dramatically [9]. These features describe the unmet need for biomarkers to stratify patients according to their risk and for new molecular targets to develop better therapeutic strategies.
Our previous research reported a TNBC 3-gene signature, based on the expression of DDIT4, POLR1C, and CCL5, which was able to discriminate TNBC patients at different risks of recurrence. This linear score was developed in tumors resistant to neo-adjuvant chemotherapy and was tested in three independent datasets of TNBC cases, where untreated tumors were assessed with microarrays [16].
In this work, we pooled data of three independent TNBC datasets and determined the TNBC subtypes, with the goal of testing our TNBC 3-gene signature on each subtype. Despite the fact that Lehman et al., (2016) [29] corrected their subtype classifications because of stromal and immune cell contamination leading to false classification of the MSL and IM subtypes, respectively, we decided to include them as subgroups in the evaluation of the TNBC 3-gene signature, due to the possible impact of the percentage of lymphocytes infiltration within the tumor on the prognosis of these patients, therefore leading to significant changes in the 3-gene signature results [14,30].
We observed clear differences between risk groups in the IM and MSL subtypes, while BL1, BL2, and M subtypes presented statistical trends. Interestingly, the lowest 3-gene score was seen in IM and MSL subtypes and these subtypes presented high infiltration of immune and stromal cells. Therefore, we concluded that the tumor microenvironment might have a strong influence on the predictive capability of the TNBC 3-gene score, which could be the explanation for the low-risk score among these subtypes [31]. In fact, during the last decades, several genomic predictors have been developed for TNBC and estrogen-negative tumors, and they share inclusion of immune or microenvironment-related gene sets [32][33][34]. For instance, Loi et al., reported that in early node-negative TNBC, TILs of ≥30% improved invasive DFS compared with <30% at 5-year follow-up (88% vs. 81%). This finding was corroborated in a meta-analysis which showed that high levels of TILs had better short term and long-term prognoses, specifically for CD4 + , CD8 + , and FOXP3 + [20,35].
We demonstrated that the 3-gene TNBC score is related to TIL infiltration, where a lower score was associated with a high infiltration of TILs. This observation might be explained by the influence of CCL5 [36,37]. The expression of this gene causes a lowering of the score and, biologically, participates in the recruitment of TILs in TNBC [16,18].
On the other hand, the TNBC 3-gene signature had a poor discriminative performance in LAR cases (p = 0.765). The LAR tumors have a high expression of the androgen receptor and levels of androgen receptor-mediated signaling. LAR tumors are biologically characterized by a low Ki-67 index and better outcomes than androgen receptor-negative tumors [38]. Our prognostic signature was based on TNBC with residual disease, therefore it was made in tumors with some grade of resistance to chemotherapy and high replication rates, which could be an explanation of the lack of risk differentiation in the LAR subtype due to its biology and overall better prognosis compared with other TNBC subtypes [16,39]. Teschendorff et al., (2007) found that the prognosis in estrogen-positive genes are associated with an expression of cell cycle genes, while, in estrogen-negative cases, prognosis is related to the expression of genes involved in the immune response pathways [40]. Rody et al., (2011), in an unsupervised clustering of data from 579 TNBC cases, found that signatures related to a high B-cell infiltration and low IL8 levels were related to better prognoses in terms of event free survival [41]. Criscitiello et al., (2018) developed a signature based on the expression of four genes to predict lymphocyte infiltration and, consequently, the ability to predict patients at different risks of death and distant recurrence. In addition, expressions of immune genes were also related with the response to neoadjuvant chemotherapy [42].
The main limitation of our study was the small sample size in each TNBC subtype (ranging from 18 to 54 cases), leading to an unpowered analysis. Despite this, clear patterns were shown in the RFS analysis. Moreover, the retrospective design of our study can lead to bias in the interpretation of our results, although we controlled several variables with the purpose to have a uniform cohort; for this reason, new randomized controlled trials assessing the TNBC 3-gene prognostic signature are needed. Furthermore, although our pooled cohort includes all TNBC patients who underwent NAC independently from the pathologic response to chemotherapy, further studies are needed that include patients divided by complete pathological response, partial response, and no response.

Conclusions
In conclusion, the TNBC 3-gene score had an improved performance of predicting the risk of recurrence in IM and MSL TNBC subtypes. This score, based only on the expression of three genes, could be useful in clinical practice to stratify TNBC patients according to their risk, particularly in cases with TILs. Further randomized controlled trials are needed to validate this score in TNBC patients and their subtypes.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and involves a reanalysis of gene expression and clinical data available in anonymized public datasets and in a previous study that was approved by the IRB from the Instituto Nacional de Enfermedades Neoplasicas (INEN 10-018).

Conflicts of Interest:
The authors declare no conflict of interest.