According to national surveillance data for the United States (US), non-white minority populations suffer higher mortality rates for most cancers [1
]. This has largely been considered a consequence of poor health care equity and/or access [2
] related to the prevalence of lower socioeconomic status (SES) for minority populations. However, European Americans (EAs) have historically been diagnosed with a higher incidence of breast cancer, compared to African Americans (AAs). Prior to the mid-1980s, breast cancer mortality rates for these self-reported race (SRR) groups was essentially the same, but then diverged in subsequent years. These persistent survival disparities are currently about 40% [1
] and occur independent of SES, which suggests there are additional factors, including biology, leading to race-group differences in mortality.
The onset of race-group mortality rate disparities coincides with the advent of hormone-targeted therapies [5
] that are now standard-of-care for hormone receptor-positive tumors. Compared to women of European descent, AA women [4
] and women of African descent world-wide [9
] have a higher incidence of triple-negative breast cancer (TNBC) [17
], which is characterized by the absence of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2). Therefore, in the context of standardizing ER/PR- and HER2-targeted therapies, the divergence of AA vs. EA mortality likely unmasked population-level differences in tumor biology, which we have previously shown to correlate with genetic ancestry [22
]. Several epidemiological studies suggest that genetic ancestry is a factor in the etiology of specific tumor phenotypes [13
], with disease outcomes based upon molecular phenotype (e.g., HR status) directly affecting treatment decisions, regardless of SES barriers to high-quality clinical care.
TNBC, one of the most aggressive forms of breast cancer, has limited treatment options that are ineffective when the cancer is diagnosed at later stages [25
]. Since AA women tend to be diagnosed at later stages [30
], at an early age [32
], and suffer higher rates of TNBC, these factors likely contribute to AAs having the highest breast cancer mortality rate among all race groups. Even within TNBC cases, AA women have a higher mortality compared to EA women [4
], and these race-/ethnicity-associated differences in TNBC survival suggest that there is a difference in disease progression, which may be driven by differences in gene expression that are detectable by genomic investigations. Multiple lines of evidence support this theory, including differences in the prevalence of “Vanderbilt TNBC subtypes
] among SRR groups, in which gene expression signatures define these subtypes. Although this TNBC subtype classification was intended to assist with clinical management and identification of targetable genes in TNBCs [36
], these subtypes represent a myriad of heterogeneity [37
] that has yet to be fully defined for understudied/minority populations, who suffer most from TNBC.
To have better representation of phenotypic variation in TNBC tumors, we report here our investigation of differences in TNBC primary tumor gene expression using bulk RNAseq, comparing TNBCs of AAs to those of EAs. As opposed to use of traditional methods identifying differentially expressed genes (DEGs) between SRR groups, we quantified genetic ancestry (QGA) for individual patients across five human ancestry super groups, and identified the African ancestry-associated gene expression signatures of TNBCs. We then determined whether these racial/ethnic differences in gene expression reveal insights into biological pathways, and characterized the TNBC subtypes using a newly revised method for categorizing subtypes, building upon previously validated tools. We also characterized tumor-associated immune responses for each tumor. Furthermore, we determined whether phenotypic subtypes were associated with genetic ancestry, as well as whether there were biases of prevalence of TNBC phenotypes between patient SRR and ancestry groups.
TNBC, the most aggressive form of breast cancer, has limited treatment options. It is characterized by poor overall survival, with recurrent, distant metastatic disease common within the first three years after aggressive chemotherapy treatment. TNBC disproportionally affects young AA women, and there is increasing evidence that this disparity cannot be attributed to solely to SES and lack of access to care. Our previous studies [14
] and others [47
] have demonstrated differences in gene expression based upon race. However, since SRR does not allow more than correlation with African ancestry, Quantified Genetic Ancestry (QGA) analyses are needed to understand the shared genetic drivers of TNBC observed across the modern African Diaspora [6
]. Furthermore, due to the heterogenicity of TNBC, additional tools are required to define TNBC subtypes and ancestry-related differences within TNBC subtypes. Herein, we used two newly developed tools to evaluate the heterogeneity of TNBC for patients with African ancestry.
Prior studies that compare SRR groups for differential gene expression in breast and other cancers have revealed significant differences between AA and EA race groups [14
]; however, many of these changes are confounded by genetic admixture and non-genetic factors that prevent clear interpretation of genetic contributions to SRR differences in tumor biology. By use of QGA, we identified ancestry-related differential expression of genes in treatment-naïve and residual TNBC tumors which were involved in canonical cancer pathways, but had predictions of modified functional activity. This deconvolution of ancestry has also been employed in prostate cancers, also revealing gene expression correlated with specific West African ancestry [49
]. We observed that specific pseudogenes, that showed reduced levels of expression associated with African ancestry, are located in regions of the genome that are frequently deleted in sporadic breast and prostate tumors derived from AA patients [50
]. An example of this expression/deletion effect involves a pseudogene, RNU2-6p
, which is downregulated in AA TNBCs (Figure S6
). According to GTEx data, this gene is not typically expressed in normal breast tissue; however, it is highly expressed in breast tumors within our cohort, but with significantly reduced expression in untreated TNBC tumor of patients with significant African ancestry. The functional relevance of this distinction, based on in silico analyses and previously published findings [51
], is that RNU2-6p
appears to be a non-coding nuclear RNA that has a secondary structure resembling splicing machinery. This ancestry-associated pseudogene may affect exon usage and/or isoform splicing, which may contribute to unique molecular signatures in gene expression, translating into disease progression in African Americans with breast cancer, as has been previously shown in prostate cancer [53
When we compared the differential genes identified by QGA with those identified by SRR, we found a 51% overlap of ancestry-associated genes in the race-associated category. This indicates that using SRR categories for differential gene expression can diminish ancestry-related expression, given the convolution of admixture in race groups, and SRR categories will incorporate additional factors that drive differences in gene expression that are independent of genetic ancestry. This also explains the relatively larger number of differentially expressed genes associated with SRR, as opposed to genetic ancestry, and provides additional opportunities to discern the multiple factors connected to race/ethnicity that contribute to differential gene expression among race groups.
Additionally, despite the limited number of residual tumors in our cohort, we also observed a robust 13-gene expression pattern upregulated in AAs but not in EAs with residual tumors. Of note, EGFR, which is upregulated in African American breast and prostate cancers [55
], appears to be a driver within this gene signature. Additionally, genes that are downregulated in AAs had a strong expression in EA patients. These differences did not correlate with TNBC subtypes as determined by use of either the Vanderbilt or TNHF subtyping tools, suggesting that these genes are likely due to genetic ancestry. Furthermore, in the TNHF analysis, there were fewer unclassified AA patients. This has prognostic implications, since, for TNBCs, residual disease after neoadjuvant chemotherapy is associated with worse overall survival relative to that for non-TNBC patients, which is not the situation when patients achieve a complete pathologic response [57
]. Thus, identification of genes that are drivers in residual tumors can help in developing targeted adjuvant therapies that could improve survival in this patient population, for which there currently exists no effective standard of care.
The TNBC Vandy BL1/2 distribution between SRR groups was different from our previous findings in TCGA analyses [14
], likely because of inclusion all six TNBC subtype categories in that previous study. Specifically, reassignment of IM- and MSL-retired subtypes calls resulted in redistribution of tumors into sub-optimal categories, shifting the observed proportions of subtypes in SRR from our previous studies (Figure 2
A). This contradiction compelled us to ensure that the categorization of subtypes was an accurate interpretation of the biological variation across TNBC tumors, and not just a reflection of an improperly stratified training set. Our pilot utilization of the novel TNHF method, an augmented extension of the Vanderbilt tool, is distinctive in various ways. First, TNHF reports only the correlation scores for valid TNBC categories. Second, TNBC categories from TNHF are assigned as a semi-quantified ‘status’, which represents the presence/absence of a mixture of valid Vanderbilt TNBC subtypes within tumors, which corresponds to heterogeneity observed in TNBC tumors. Because this TNHF method allows us to account for subtype heterogeneity within a tumor—denoted as positive or negative annotations—this dynamic output allows for a comprehensive account of proportions of TNBC subtype signatures that may be more informative for clinical management of breast tumors. This can be transformative in TNBC disease outcomes, as certain TNBC subtypes exhibit a higher risk of recurrence and/or drug resistance. Therefore, information of mixed tumor types may help predict adverse outcomes or limited treatment response and tumor evolution in the context of residual tumor behavior. In our cohort, African ancestry patients had a higher rate of basal-like 2 positive/basal-like 1 negative (BL2+/BL1−) TNBC subtypes, which is similar to previous findings for AA patients [14
]. This positive/negative integration of all potential TNBC categories, which have prognostic value, has added clinical utility, particularly for making treatment decisions. The capacity of gene expression profiles to predict treatment response is supported by clinical trial data showing differences in pathological complete responses based upon Vanderbilt TNBC subtypes [60
]. For example, in the GEICAM/2006-03 TNBC neoadjuvant chemotherapy clinical trial, the best responders were in the BL1 group, with 60% of patients achieving a pathologic complete response compared to 20% in the LAR and IM groups [60
]. Thus, use of the more refined TNHF subtyping tool, which can provide information such that a tumor is equally BL2+ and M+, can have a greater impact on neoadjuvant treatment decisions and can inform subsequent choices if standard treatments fail.
Both BL2/BL1 subtypes are also associated with immune gene signatures for AA TNBC patients [14
], which appears to be driven by IL-6 and TP53 signaling as determined by IPA. Both IL-6 [65
] and p53 activation [66
] are associated with African American tumors, validating the robustness of our analysis tools. Although we found no significant associations with Tumor Associated Leukocyte (TAL) scores, most likely due to samples being isolated from macro-dissected regions enriched for tumor cells and depleted of stromal and/or highly infiltrated regions, tumors of patients with significant African ancestry corresponded with lower TAL score compared to patients with predominantly European ancestry among treatment-naïve patients. In the tumor microenvironment, various genes, including immunological genes, are differentially expressed by race/ethnicity [46
]. However, some studies that utilize public datasets that have low representation of ethnic groups indicate that immunological differences in TNBCs are relatively small [72
]. Although, at the individual level, we found a difference in lymphocytic infiltration, it was not obvious at the race/ethic group level, may be due to small sample numbers in each race/ethnic group. However, higher TAL scores for EAs and lower TAL scores for treated, residual tumors were noted. A TNBC study of south Asian patients has reported increased infiltration of T-lymphocytes [73
] and suggest that TNBCs with higher immunogenicity may be candidates for immunotherapy [74
]. Thus, higher TAL scores observed in our EA TNBC patient cohorts could be exploited to select the relevant immunotherapies.