The Association between Mutational Signatures and Clinical Outcomes among Patients with Early-Onset Breast Cancer

Early-onset breast cancer (EoBC), defined by a diagnosis <40 years of age, is associated with poor prognosis. This study investigated the mutational landscape of non-metastatic EoBC and the prognostic relevance of mutational signatures using 100 tumour samples from Alberta, Canada. The MutationalPatterns package in R/Bioconductor was used to extract de novo single-base substitution (SBS) and insertion–deletion (indel) mutational signatures and to fit COSMIC SBS and indel signatures. We assessed associations between these signatures and clinical characteristics of disease, in addition to recurrence-free (RFS) and overall survival (OS). Five SBS and two indel signatures were extracted. The SBS13-like signature had higher relative contributions in the HER2-enriched subtype. Patients with higher than median contribution tended to have better RFS after adjustment for other prognostic factors (HR = 0.29; 95% CI: 0.08–1.06). An unsupervised clustering algorithm based on absolute contribution revealed three clusters of fitted COSMIC SBS signatures, but cluster membership was not associated with clinical variables or survival outcomes. The results of this exploratory study reveal various SBS and indel signatures may be associated with clinical features of disease and prognosis. Future studies with larger samples are required to better understand the mechanistic underpinnings of disease progression and treatment response in EoBC.


Introduction
Breast cancer remains the most diagnosed malignancy and most common cause of cancer death among women globally, with an estimated 2.3 million diagnoses and 666,000 deaths in 2022 [1].Approximately 4.5% of cases in Canada are early-onset breast cancer (EoBC), defined by a diagnosis before 40 years of age, compared to an estimated 10% of cases worldwide [1,2].In Canada, the incidence rate of EoBC has increased annually by 0.66% from 2000 to 2015 compared to 0.21% in the overall population during the same time period [3].Further, survival outcomes have not improved in EoBC to the same extent as in the overall breast cancer population over time [4].
EoBC presents clinical challenges in part due to its rarity but also because of few established risk factors for prevention.Organized routine mammography screening in Canada is indicated for women aged 50-74 years, with the harms of screening outweighing the benefits in women < 40 years [5].Therefore, EoBC is often detected symptomatically and at later stages [6][7][8][9].EoBC is also likely to present with more aggressive disease biology, including the human epidermal growth factor receptor-2 (HER2)-enriched and triple-negative (TNBC) subtypes, stressing the importance of the clinical management of EoBC [10,11].It is well accepted that the risk of recurrence and mortality is higher in EoBC compared to the overall breast cancer population; however, drivers remain poorly understood.Large epidemiological studies established age < 40 years as an independent risk factor of poor prognosis in breast cancer, even after adjustment of pathological features and treatments received [4,10,[12][13][14][15][16][17].This has driven clinical debate as to whether inferior outcomes in EoBC are due to an overrepresentation of aggressive disease features or a unique disease biology [18].
The strongest established risk factors in EoBC include inherited genetic mutations.However, less than 10% of breast cancer incidence among young women is attributable to heritable mutations in the BRCA1 or BRCA2 genes [19,20].Further, Copson et al. found no evidence that germline mutations were related to mortality or tumour aggressiveness among breast cancer patients aged < 40 [19].This suggests an important role for somatic mutations caused by lifestyle or environmental exposures in combination with intrinsic processes in tumour progression and survival in young women.Somatic mutations are found in all cancer genomes.A small proportion are drivers that confer clonal advantage, are causally implicated in oncogenesis, and have been positively selected during the evolution of the cancer [21][22][23][24].Somatic driver mutations in over 30 cancer genes have been implicated in breast cancer development, including AKT1, BRCA1, CDH1, GATA3, PIK3C, PTEN, RB1, and TP53 [10,21,22].Comparatively fewer studies have assessed driver mutations of recurrence and metastasis in breast cancer, and no such studies have been performed in early-onset populations.
The remaining somatic mutations are "passengers", which do not contribute to cancer development.However, passenger mutations bear the imprints of the DNA damage and repair processes operative during the development of the cancer, unmodified by selection [25].Advancements in next-generation sequencing have permitted sequencing of whole cancer genomes and identified thousands of single nucleotide variants (SNVs) in breast cancer genomes [26,27].There are six unique types of SNVs: C>A, C>G, C>T, T>A, T>C, and T>G.Each of the substitutions is examined by incorporating information on the bases immediately 5 ′ and 3 ′ to each mutated base generating 96 possible mutation types (6 types of substitution*4 types of 5 ′ base*4 types of 3 ′ base).The array of mutation types is represented in a mutational spectrum, then decomposed into recurring patterns, referred to as mutational signatures.Sixty validated single-base substitution (SBS) mutational signatures are listed in the Catalogue Of Somatic Mutations In Cancer (COSMIC) version 3.3, in addition to 18 insertion-deletion (indel) signatures [28].Mutational signatures can be used to decipher how patterns of somatic mutations collectively give rise to mutational processes of disease as well as give insight into the potential etiology of the processes underlying these signatures.
Mealey et al. performed one of the most comprehensive analyses of the mutational landscape of breast cancer ≤40 years [23].They found that COSMIC signatures SBS1, 3, and 5 were the most common in the overall cohort and that SBS2 and SBS3 were more likely to be observed in HER2-enriched and triple-negative tumours, respectively.Compared to patients >60 years, early-onset patients were significantly more likely to have C>A mutations (17% vs. 16%) and less likely to have C>T mutations (32% vs. 38%).Finally, patients ≤40 years were more likely to have mutations in GATA3 compared to those >40 years and >60 years (22% vs. 12.9% vs. 10.8%)[23].Studies like this provide insight into multiple genomic features related tumour development in women < 40 years.To date, there have been no applications of mutational signatures to assess outcomes in EoBC and no studies have investigated indel signatures among these patients.Similar to Mealey et al., genomic data can be leveraged to understand how various somatic mutations collectively drive tumour progression and survival in young women.These analyses may discover novel markers to inform targeted therapies or may improve the performance of existing prediction tools to better inform individualized prognosis.In this study, we examine whole exome sequences from 100 EoBC patients in Alberta, Canada to describe their somatic mutation landscape, including mutational load, SBS, and indels.We also extracted de novo Genes 2024, 15, 592 3 of 21 SBS and indel signatures and fit mutational profiles to validate COSMIC SBS and indel signatures.Finally, we examined whether extracted and fitted COSMIC signatures were associated with clinicopathological tumour characteristics and survival outcomes.

Study Sample and Data Collection
Somatic mutation and clinical data were obtained from 100 women between the ages of 18-39 years diagnosed with invasive non-metastatic breast cancer in Alberta, Canada, from 2001 to 2014.Mutational data were derived from tumour tissue and normal blood samples stored at the Alberta Cancer Registry Biobank.Tumour tissue was extracted at time of surgery or biopsy and stored as formalin-fixed paraffin embed blocks.Blood samples were also collected at time of surgery or biopsy and centrifuged for buffy coat extraction.Tumour and normal blood samples were sent to Genome Québec for DNA extraction and whole-exome sequencing.Extraction was performed with QIAsymphony DSP DNA Kits (QIAGEN, Hilden, Germany) and sequencing was performed with NovaSeq 6000 S4 PE100 (Illumina, San Diego, USA) and SureSelect Human All Exon exome probes (Agilent Technologies, Santa Clara, USA).Following sequencing, variant calling was performed using the Mutect2 workflow [29] from the Canadian Centre for Computational Genomics (C3G) and obtained in the form of variant call files (VCF).The corresponding reference genome was GRCh37/hg19.Clinical data were obtained through linkage with the Alberta Cancer Registry and included detailed information on baseline demographics, cancer diagnosis (stage and morphology), dates of referral to oncology, clinic visits at any of the cancer centers, surgical procedures, dates and types of therapy received for cancer (chemotherapy, radiation, and hormonal therapy), tumour size, grade, lymph node status, ER/PR status and HER2/neu status, and dates of last follow-up or death.Administrative end of follow-up was 25 February 2018.

Extraction of Mutational Signatures
Mutational signatures were investigated using the MutationalPatterns package in R (v4.3)/Bioconductor (v3.17) [30].This package includes a comprehensive set of functions for extracting mutational signatures de novo and determining the contribution of previously identified mutational signatures on a single sample level.The package works with SNVs, indels, double-base substitutions (DBS) and larger multi-base substitutions (MBS).The VCF files for each participant were passed through "read_vcfs_as_granges" and "get_mut_type" commands to obtain counts of the six SNV types (C>A, C>G, C>T, T>A, T>C, T>G) and indel types.
De novo SBS and indel mutational signature extraction was achieved with nonnegative matrix factorization (NMF) using the "extract_signatures" command.The NFM algorithm is detailed by Gaujoux and Seoighe [31].In brief, the algorithm factorizes some matrix X, which has rows n and columns m, into two smaller nonnegative matrices W and H, where the product of W and H approximates X. W is defined by n × r and H is defined by r × m, where r is the factorization rank, which is the number of extracted de novo signatures.We sampled ranks from 2 to 10.The optimal factorization rank was based on the smallest rank for which the cophenetic correlation coefficient started decreasing.For example, in the case of SBS mutations, the rows of matrix X were the 96 mutational contexts derived from combinations of 6 mutational types (i.e., C>A, C>G, C>T, T>A, T>C, and T>G) and their 5 ′ -and 3 ′ -adjacent bases, and the columns were the 100 EoBC samples.The optimal rank can be interpreted as the minimal set of mutational signatures that optimally explains the proportion of each mutation type and estimates the contribution of each signature to each sample [32].The "fit_to_signature" command determined which COSMIC SBS and indel signatures were present in our samples.This function finds the optimal linear combination of mutation signatures that most closely reconstructs the mutation matrix by solving the nonnegative least-squares constraints problem.
As there are 60 and 18 validated COSMIC SBS and indel signatures, respectively, we employed hierarchal clustering algorithms to determine specific combinations of mutational signature contributions.This clustering analysis was only performed on COSMIC signatures present in >25% of samples.Absolute contribution values for each signature were standardized prior to clustering.Euclidian distance was then calculated to form a distance matrix and passed through a hierarchal clustering algorithm based on Ward's minimum variance method.The average silhouette method determined the optimal number of clusters.The unadjusted associations between cluster membership and demographic and clinical variables were assessed with Fisher's exact test and multivariable logistic regression assessed mutually adjusted associations.
Recurrence-free survival (RFS) and overall survival (OS) were the primary outcomes to evaluate the prognostic relevance of de novo signatures and COSMIC signature clusters.RFS was defined as time from primary surgery to local-regional or distant relapse, contralateral breast cancer, the appearance of a second (non-breast) primary tumor, or death from breast cancer.OS was defined as time from primary surgery to death from any cause.De novo signatures were converted into binary variables based on absolute contribution below the median (low expression), or equal to or greater than the median (high expression).The Kaplan-Meier method was used to estimate curves for RFS and OS, as well as median time-to-event and 95% confidence intervals (95% CI).Association measures were estimated with multivariable Cox proportional hazard models in the form of hazard ratios (HR) with 95% CI.Statistical significance was defined by p-value <0.05.All analyses were performed in RStudio (v2023.06.0+421).

Mutational Load
The median mutational load (SNVs + indels) identified in EoBC tumours was 596.5 (IQR = 478.25-688.25).Mutational load was primarily comprised of SNVs, with a median of 567 (IQR = 469.25-657.75).The median number of indels was 26 (IQR = 20.75-34.00).The distributions of mutational load, number of SNVs, and number of indels were positively skewed so the data were log-transformed to examine differences across demographic and clinical variables.In general, mean of the log-transformed mutational load and number of SNVs tended to be higher in the overweight BMI category versus normal/underweight, TNBC subtype versus luminal, lymph node-negative tumours versus lymph node-positive, and tumours ≤2 cm versus >2 cm, although statistical significance was not achieved (0.05 ≥ p-value < 0.20) (Table 2).Those without vascular invasion had significantly higher mean log-transformed mutational load (p = 0.007) and number of SNVs (p = 0.009) versus those with vascular invasion (Table 2).Regarding log-transformed indels, the mean was significantly higher in the TNBC subtype versus luminal (p = 0.029), lymph node-negative tumours (p = 0.035), and in those without vascular invasion (p = 0.001) (Table 2).Abbreviations: cm = centimetre; ER = estrogen receptor; HER2 = human epidermal growth factor receptor 2; kg = kilograms; m = metres; SD = standard deviation; T stage = tumour stage; TNBC = triple-negative breast cancer.

Extracted De Novo SBS and Indel Signatures
The NMF algorithm decomposed the mutational spectra of all breast tumours into five SBS and two indel signatures.These signatures were named after existing COSMIC signatures if they had a cosine similarity of more than 0.85.The SBS signatures were named as follows: SBSA, SBS13-like, SBS29-like, SBS6-like, and SBS42-like. Figure 1 illustrates the distribution of SNV types for the extracted SBS signatures.SBSA did not have a cosine similarity of 0.85 with any existing COSMIC signature and was characterized by a high contribution of T>G mutations.The SBS13-like signature had high relative contributions from C>T and C>G mutations.The SBS29-like signature comprised of C>A mutations.Low peaks of C>T and T>C mutations defined the SBS6-like signature.Finally, the SBS42-like signature had high relative contribution of C>T mutations, followed by C>A and T>C mutations.The two de novo indel signatures were named ID6-like and ID12-like as they had a cosine similarity of more than 0.85 with existing COSMIC signatures (Figure 2).The ID6-like signature had a high frequency of microhomology deletions of ≥5 base pairs and the ID12-like signature had high frequency of >1 base pair deletions at repeat sites.
Table 3 compares mean relative contribution of the de novo SBS and indel signatures across categories of clinical variables.For SBS13-like, mean relative contribution was significantly higher in those aged 30-34 (p = 0.026) and 35-39 (p = 0.015) relative to <30 years, higher in the HER2-enriched subtype relative to luminal (p = 0.034), and T3 tumours relative to T1 (p = 0.011).The mean relative contribution of the SBS29-like signature was significantly lower in the TNBC subtype than luminal (p < 0.001).Relative to the normal/underweight BMI category, the overweight BMI category had significantly lower mean relative contribution of the SBS6-like (p = 0.003) and SBS42-like signatures (p = 0.045).As there were only two extracted indel signatures, the relative contributions of ID6-like and ID12-like were complimentary.The mean relative contribution of the ID6-like signature was significantly higher in the obese BMI group versus normal/underweight and significantly lower in the HER2-enriched subtype versus luminal.Relative contribution plots for the de novo SBS and indel signatures are presented in Figures S1 and S2.
mean relative contribution of the SBS6-like (p = 0.003) and SBS42-like signatures (p = 0.045).As there were only two extracted indel signatures, the relative contributions of ID6-like and ID12-like were complimentary.The mean relative contribution of the ID6-like signature was significantly higher in the obese BMI group versus normal/underweight and significantly lower in the HER2-enriched subtype versus luminal.Relative contribution plots for the de novo SBS and indel signatures are presented in Figures S1 and S2.Extracted insertion-deletion signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents the homopolymer length for single-base pair deletions and insertions, the number of repeat units for >1 base pair deletions and insertions at repeats, and microhomology length for microhomology deletions.The yaxis is the number of insertions-deletions. mean relative contribution of the SBS6-like (p = 0.003) and SBS42-like signatures (p = 0.045).
As there were only two extracted indel signatures, the relative contributions of ID6-like and ID12-like were complimentary.The mean relative contribution of the ID6-like signature was significantly higher in the obese BMI group versus normal/underweight and significantly lower in the HER2-enriched subtype versus luminal.Relative contribution plots for the de novo SBS and indel signatures are presented in Figures S1 and S2.Table 4 presents crude and mutually adjusted HR estimates for the associations between each de novo signature and RFS, as well as OS.In general, there was no evidence to conclude whether the hazard of recurrence differed between high-expression and lowexpression groups for most signatures.However, the unadjusted HR for the SBS13-like signature demonstrated a significant reduction in recurrence hazard for those with high signature expression versus low expression (HR = 0.36; 95% CI: 0.13-0.98).The mutually adjusted estimate showed a 71% reduction (HR = 0.29; 95% CI: 0.08-1.06),although statistical significance was not achieved.Similar reductions in the hazard of death were estimated for the SBS13-like signature but with less precision.Abbreviations: cm = centimetre; ER = estrogen receptor; HER2 = human epidermal growth factor receptor 2; kg = kilograms; m = metres; SD = standard deviation; T stage = tumour stage; TNBC = triple-negative breast cancer.

COSMIC SBS and Indel Signatures and Clustering Analysis
The mean relative contribution and prevalence of all COSMIC SBS and indel signatures in our 100 EoBC cases are presented in Tables S1 and S2, and relative contribution plots are presented in Figures S3 and S4, respectively.Six SBS signatures were present in over 50% of the cohort: SBS15 (91%), SBS24 (89%), SBS87 (77%), SBS42 (76%), SBS13 (67%), and SBS18 (60%).Twenty-six SBS signatures were not present.All COSMIC indel signatures were present.ID12 was present in 98% of the cohort and had a mean relative contribution of 50% (SD = 20.4%), the highest of all COSMIC signatures.In total, eight COSMC SBS signatures had a prevalence of >25% and were included in the hierarchal clustering algorithm: SBS15, SBS18, SBS24, SBS26, SBS37, SBS39, SBS42, SBS87.Three clusters were identified (Figure 3).Cluster 1 (n = 62) included contributions from all signatures except SBS26 and SBS39.Cluster 2 (n = 8) included substantial contributions from SBS18 and SBS24.Cluster 3 (n = 30) included contributions from all signatures except SBS37.We did not observe statistically significant univariable associations between cluster membership and clinical variables (Table 5).Evidence was insufficient to conclude whether the unadjusted and mutually adjusted hazards of recurrence and death of Cluster 2 and 3 differed from Cluster 1 (Table 6).We attempted to perform hierarchal clustering with the COSMIC indel signatures; however, due to high relative contribution of ID12 in most samples, only one cluster was identified.

COSMIC SBS and Indel Signatures and Clustering Analysis
The mean relative contribution and prevalence of all COSMIC SBS and indel signatures in our 100 EoBC cases are presented in Tables S1 and S2, and relative contribution plots are presented in Figures S3 and S4, respectively.Six SBS signatures were present in over 50% of the cohort: SBS15 (91%), SBS24 (89%), SBS87 (77%), SBS42 (76%), SBS13 (67%), and SBS18 (60%).Twenty-six SBS signatures were not present.All COSMIC indel signatures were present.ID12 was present in 98% of the cohort and had a mean relative contribution of 50% (SD = 20.4%), the highest of all COSMIC signatures.In total, eight COSMC SBS signatures had a prevalence of >25% and were included in the hierarchal clustering algorithm: SBS15, SBS18, SBS24, SBS26, SBS37, SBS39, SBS42, SBS87.Three clusters were identified (Figure 3).Cluster 1 (n = 62) included contributions from all signatures except SBS26 and SBS39.Cluster 2 (n = 8) included substantial contributions from SBS18 and SBS24.Cluster 3 (n = 30) included contributions from all signatures except SBS37.We did not observe statistically significant univariable associations between cluster membership and clinical variables (Table 5).Evidence was insufficient to conclude whether the unadjusted and mutually adjusted hazards of recurrence and death of Cluster 2 and 3 differed from Cluster 1 (Table 6).We attempted to perform hierarchal clustering with the COSMIC indel signatures; however, due to high relative contribution of ID12 in most samples, only one cluster was identified.Abbreviations: cm = centimetre; ER = estrogen receptor; HER2 = human epidermal growth factor receptor 2; kg = kilograms; m = metres; SBS = single-base substitution; T stage = tumour stage; TNBC = triple-negative breast cancer.

Discussion
In this study, we characterized the somatic mutation landscape of 100 EoBC tumours from Alberta, Canada and assessed their relationship with clinicopathological tumour features and survival outcomes.Our findings indicated higher numbers of SNVs and indels among patients without vascular invasion, in addition to a higher number of indels with lymph node-negative and TNBC tumours.We extracted five de novo SBS signatures, four of which resembled validated COSMIC SBS signatures, and two de novo indel signatures resembling ID6 and ID12.The mean relative contribution of these de novo signatures mainly differed between BMI categories and molecular subtypes.RFS tended to be better among individuals with high SBS13-like signature expression relative to low, and worse in those with high SBS29-like signature expression relative to low.The hierarchal clustering algorithm of validated COSMIC SBS signatures revealed three distinct clusters.However, evidence was insufficient to conclude whether cluster membership was associated with clinical variables and with survival outcomes.This is the first study to examine the prognostic relevance of somatic mutational signatures and describe differences in signature distribution across clinicopathological tumour characteristics among patients with EoBC.We expanded upon previous work from Mealey et al., who investigated differences in mutational profiles between breast cancer patients < 40 years and ≥40 years with The Cancer Genome Atlas (TCGA) Breast Invasive Carcinoma project (TCGA-BRCA) data [23].They also extracted five de novo SBS signatures in their <40 years subgroup, three of which had similar SNV mutation profiles to the signatures we extracted.Specifically, SBSA, SBS6-like, and SBS13-like signatures in our study resembled signatures S2, S3, and S1 in their study, respectively.SBSA had high relative contributions of T>G in the ATG, TTG, and GTT contexts.This was visually most alike COSMIC SBS55, previously observed in Alexandrov et al., a non-validated signature arising from a possible sequencing artifact.The SBS6-like signature was characterized by low peaks of C>T and T>C mutations.The peaks of C>T mutations in the ACG, CCG, and GCG contexts were similar to COSMIC SBS1 and SBS6, and the contribution of T>C mutations likely reflects a combination of signatures present at low levels.The SBS29-like and SBS42-like signatures were unique to this study and are generally not found in breast cancers [28].COSMIC SBS29 is linked to chewing tobacco use and SBS42 is linked to haloalkane exposure.The role of smokeless tobacco in breast cancer is not well established.A hospital-based case-control study in Assam, India, found the odds of being diagnosed with breast cancer were 2.35 times higher in betel quid chewers vs. non-chewers [33].Interestingly, SBS29 was also found among early-onset testicular cancer tumours, although there is no established link between chewing tobacco and testicular cancer.It is possible that SBS29 represents the process involved in early-onset cancers, but greater research is needed in other sites to confirm this speculation.
The SBS13-like signature resembled a combination of COSMIS SBS2 and SBS13, which often occur together in the same sample.These signatures are attributed to the activity of the AID/APOBEC family of cytidine deaminases, which substantially contribute to the mutation burden in many human cancers, especially in bladder and breast cancers [32].We observed higher relative contributions of the SBS13-like signature in the HER2-enriched subtype and HER2-positive tumours, similar to Mealey et al. [23].Further, our findings show RFS and OS tended to be better in patients with high SBS13-like expression, even after adjustment for the subtype.Among breast cancer subtypes, HER2+ breast tumours are reported to have the highest median levels of APOBEC signature enrichment [34].APOBEC-related mutagenesis is thought to play an important role in tumour immunogenicity, namely in neoantigen presentation and recruitment of T-cells to the tumour microenvironment, implying its potential for cancer immunotherapy [35].However, this likely depends on the molecular subtype.In a TCGA cohort, DiMarco et al. observed high correlation between APOBEC enrichment and immune signatures reflective of an antitumor adaptive immune response in the TNBC subtype, including Th1 cells, CD8 + T cells, cytotoxic cells, interferon signaling pathway, major histocompatibility complex class II antigen presentation pathway [36].Conversely, the APOBEC enrichment score was not correlated with immune cell signatures in HER2-enriched breast cancers.Instead, APOBEC enrichment was associated with a higher frequency of subclonal mutations and may suggest the evolution of immune-suppressive mechanisms that limit antitumor adaptive immune responses [36].These findings suggest a subgroup of TNBC patients who may benefit from immunotherapy and equally a subgroup of HER2+ patients who may not benefit from immunotherapy beyond anti-HER2 therapy.Unfortunately, our prognostic findings of the SBS13-like signature could not be stratified by subtype due to limited sample size and we could not ascertain if these effects were mediated by treatment received.Nonetheless, there may be a role of ABOPEC-related mutational signatures, like SBS2 and SBS13, as a biomarker for immunotherapy response in breast cancer, regardless of age.APOBEC signatures are associated with a greater likelihood of response to immune checkpoint inhibition in non-small cell lung cancer, head and neck cancer, and bladder cancer [32,[36][37][38].
We also extracted de novo indel signatures that resembled COSMIC ID6 and ID12.Currently, the proposed etiology of the ID12 signature is unknown.The ID6 signature arises from defective homologous recombination-based DNA damage repair, often due to inactivating BRCA1 or BRCA2 mutations, leading to non-homologous DNA end-joining activity [32].Given that these mutations are associated with younger age and TNBC, it was not unexpected that this signature was extracted in our EoBC cohort, and that relative contribution was highest in the TNBC subtype.Further, we found that the number of indel mutations was higher in TNBCs.Although the ID6-like signature did not bear prognostic significance in our study, there is an important role for homologous recombination deficiency (HRD) in TNBC.Poly(ADP-ribose) polymerases (PARP) inhibitors have been successfully implemented in the treatment of metastatic breast cancer with germline mutations in BRCA1/2 [39,40].The recent OlympiA trial also established the efficacy of PARP inhibitors for BRCA1/2 mutation carriers in the early-stage setting, where the median age of the trial population was 43 years, and 82% of participants had TNBC [41].The application of these treatments is being explored in patients who display a "BRCAness" phenotype.BRCAness refers to malignancies that have not arisen from germline BRCA1 or BRCA2 mutations but share the phenotypic and molecular features of HRD [42].These malignancies share the same therapeutic vulnerabilities with BRCA-associated tumors including sensitivity to platinum chemotherapy [43][44][45].However, there is no standardized biomarker of "BRCAness" currently available.Further characterization of this phenotype may aid in predicting response to PARP inhibitors in expanded patient populations.
Our analysis of fitting mutational profiles to COSMIC SBS signatures revealed results not in line with previous literature.This is the first study to examine COSMIC v3.2 signatures in the EoBC setting; therefore, these analyses were exploratory in nature.We found high prevalence of newly added signatures, including SBS37, SBS39, SBS42, and SBS87.The most common COSMIC signatures previously observed in breast tumours are SBS1, SBS2, SBS3, SBS5, SBS13, and SBS18.Mealey et al. found that SBS1, SBS3, and SBS5 were the most prevalent COSMIC signatures and had the highest mean contributions in patients < 40 years.Conversely, we observed each of these signatures in five or fewer patients.We observed SBS13 in 15% of samples and SBS18 in 60% of samples.Given that our extracted de novo SBS signatures matched similar profiles to those from Mealey et al. and Nik-Zainal et al. [23,46], these discrepancies may be explained by suboptimal fitting of known COSMIC signatures rather than biological differences between study samples.The MutationalPattern package uses COSMIC v3.2 whereas Mealey et al. was based on COSMIC v2.0 [23].It is possible that doubling the number of signatures led to overfitting and misattribution in our sample.That is, if samples contained various combinations of mutational signatures the fitting algorithm may erroneously attribute mutations to one signature.This may explain why we did not observe any associations between the COSMIC SBS cluster group and clinical variables.Therefore, we cannot confidently conclude that the high prevalence of recently added COSMIC SBS signatures is biologically or clinically relevant in EoBC.
This study included several strengths.To our knowledge, it is the first to investigate the prognostic relevance of SBS and indel signatures EoBC.We examined multiple characterizations of somatic mutations including mutation load, SNVs, indels, and mutational signatures.Further, provide information on their associations with important molecular and physical tumour characteristics, as well as with RFS and OS.We also extracted an APOBEC-like SBS signature in EoBC, consistent with previous findings, and elucidated extracted indel signatures.There are several limitations to note.First, our study included small sample size, limiting the statistical power and generalizability of our results.Second, this study used WES data so we cannot draw conclusions related to mutations in the genome outside the exome.We also did not investigate germline mutations or signaling pathways, and so did not produce new evidence linking mutational signatures to germline mutations or cellular signaling.Third, the exploratory nature of the study meant the use of data-driven techniques.For example, we converted extracted SBS and indel signatures to binary variables based on a median cut-off for the survival analyses.We also used an unsupervised clustering algorithm for COSMIC SBS signatures.Although these methods have been used in previous research, we cannot confirm their clinical or biological relevance.Fourth, due to the limited sample size, we lacked sufficient power to examine the prognostic relevance of signatures within subgroups and we did not have data on patient race and ethnicity.Mutational profiles can vary between racial and ethnic groups and may explain disparities in therapeutic response and cancer outcomes.

Conclusions
Drivers of poor outcomes in EoBC are an active area of ongoing research.In addition to identifying cancer etiologies and the causes of driver mutations, analysis of mutational signatures can also lead to direct therapeutic and prognostic insights.An increasing number of bioinformatics studies show how mutational signatures may predict response to immunotherapy, as well as bear imprints of DNA damage from chemotherapy and radiation treatment that may accelerate disease progression [47].The results of this exploratory study reveal various SBS and indel signatures may be associated with clinical variables of disease and prognosis.Future studies with larger samples are required to better understand the mechanistic underpinnings of disease progression and treatment response in EoBC.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes15050592/s1,Table S1: The mean relative contribution and proportion of fitted COSMIC single-base substitution mutational signatures in the overall study sample, Table S2: The mean relative contribution and proportion of fitted COSMIC insertion-deletion mutational signatures in the overall study sample, Figure S1: Plot of the relative contribution of extracted de novo single-base substitution mutational signatures in each sample, Figure S2: Plot of the relative contribution of extracted de novo insertion-deletion mutational signatures in each sample, Informed Consent Statement: Consent was waived by the ethics committee as the Alberta Cancer Research Biobank obtains written, informed consent from all participants who agree to donate biological samples and clinical information for use in future research studies.All data were deidentified prior to analysis and no additional contact with participants occurred.

Figure 1 .
Figure 1.Extracted single-base substitution signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents trinucleotide context (5′ and 3′ nucleotides) for the six SNV types (C>A, C>G, C>T, T>A, T>C, T>G) and the y-axis represents relative contribution.SBS = single-base substitution; SNV = single-nucleotide variant.

Figure 2 .
Figure 2.Extracted insertion-deletion signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents the homopolymer length for single-base pair deletions and insertions, the number of repeat units for >1 base pair deletions and insertions at repeats, and microhomology length for microhomology deletions.The yaxis is the number of insertions-deletions.

Figure 1 .
Figure 1.Extracted single-base substitution signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents trinucleotide context (5 ′ and 3 ′ nucleotides) for the six SNV types (C>A, C>G, C>T, T>A, T>C, T>G) and the y-axis represents relative contribution.SBS = single-base substitution; SNV = single-nucleotide variant.

Figure 1 .
Figure 1.Extracted single-base substitution signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents trinucleotide context (5′ and 3′ nucleotides) for the six SNV types (C>A, C>G, C>T, T>A, T>C, T>G) and the y-axis represents relative contribution.SBS = single-base substitution; SNV = single-nucleotide variant.

Figure 2 .
Figure 2.Extracted insertion-deletion signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents the homopolymer length for single-base pair deletions and insertions, the number of repeat units for >1 base pair deletions and insertions at repeats, and microhomology length for microhomology deletions.The yaxis is the number of insertions-deletions.

Figure 2 .
Figure 2.Extracted insertion-deletion signatures from 100 early-onset breast cancer patients in Alberta, Canada using non-negative matrix factorization.The x-axis represents the homopolymer length for single-base pair deletions and insertions, the number of repeat units for >1 base pair deletions and insertions at repeats, and microhomology length for microhomology deletions.The y-axis is the number of insertions-deletions.

Figure 3 .
Figure 3. Relative contribution heatmap of unsupervised hierarchal clustering analysis of eight COSMIC single-base substitution signatures into three distinct clusters, which are separated by red lines.The x-axis represents the COSMIC single-base substitution signatures and the y-axis represents samples.SBS = single-base substitution.

Figure 3 .
Figure 3. Relative contribution heatmap of unsupervised hierarchal clustering analysis of eight COSMIC single-base substitution signatures into three distinct clusters, which are separated by red lines.The x-axis represents the COSMIC single-base substitution signatures and the y-axis represents samples.SBS = single-base substitution.

Figure S3 :
Figure S3: Plot of the relative contribution of fitted COSMIC single-base substitution mutational signatures in each sample, Figure S4: Plot of the relative contribution of fitted COSMIC insertiondeletion mutational signatures in each sample.Author Contributions: Conception and design were by D.R.B. and R.B.B. R.B.B. and D.E.O.selected and developed the methodology.D.R.B. and M.L.Q.acquired the clinical and genomic data.R.B.B. extracted mutational load, indel, single nucleotide variant, and mutational signature data as well as conducted the statistical analysis and interpreted the results.R.B.B. wrote the manuscript and D.E.O., M.L.Q., S.L., Y.X., W.Y.C. and D.R.B. provided critical review.The study was supervised by M.L.Q., S.L., Y.X., W.Y.C. and D.R.B.All authors have read and agreed to the published version of the manuscript.Funding: D.R.B. was supported by a Canadian Institutes of Health Research Grant (#397332).R.B.B. was supported by the Carole May Yates Memorial Endowment for Cancer Research.D.E.O. was supported by a Canadian Institutes of Health Research Post-Doctoral Fellowship.These funders had no role in the design and conduct of the study or preparation and submission of the manuscript for publication.Institutional Review Board Statement: This study was approved by the Health Research Ethics Board of Alberta Cancer Committee (HREBA-CC) (reference ID: HREBA.CC-17-0156).

Table 1 .
Patient characteristics of the study sample, which included 100 patients diagnosed with invasive breast cancer 18-39 years of age in Alberta from 2001 to 2014.

Table 2 .
Comparing the mean of the log of mutational load, number of single nucleotide variants, and number of insertion-deletion mutations across categories of patient characteristics.

Table 3 .
Comparing the mean relative contribution of extracted de novo single-base substitution and insertion-deletion mutational signatures across categories of patient characteristics.Relative contribution is a proportion between 0 and 1.

Table 4 .
Estimated hazard ratios for the relationships between extracted de novo single-base substitution mutational signatures, insertion-deletion mutational signatures, and both recurrence-free and overall survival.
* High expression means absolute contribution was equal to or above the median.Low expression means absolute contribution was below the median.ª Adjusted for age category, BMI category, molecular subtype, tumour size category, lymph node count, and grade.º Adjusted for age category, BMI category, lymph node count, ER status, and grade.Abbreviations: CI = confidence interval; ID = insertion-deletion; HR = hazard ratio; SBS = single-base substitution.

Table 5 .
Comparing the frequency of patients in cluster groups resulting from the unsupervised hierarchal clustering algorithm of COSMIC single-base substitution signatures across categories of patient characteristics.

Table 6 .
Estimated hazard ratios for the relationships between cluster groups resulting from the unsupervised hierarchal clustering algorithm of COSMIC single-base substitution signatures, and both recurrence-free and overall survival.Adjusted for age category, BMI category, molecular subtype, tumour size category, lymph node count, and grade.º Adjusted for age category, BMI category, lymph node count, ER status, and grade.Abbreviations: CI = confidence interval; HR = hazard ratio; SBS = single-base substitution. ª