Identification of Shared Neoantigens in BRCA1-Related Breast Cancer

Personalized neoantigen-based cancer vaccines have been shown to be safe and immunogenic in cancer patients; however, the manufacturing process can be costly and bring about delays in treatment. Using off-the-shelf cancer vaccines targeting shared neoantigens may circumvent these problems. Unique mutational signatures and similar phenotypes found among BRCA1-mutated breast cancer make it an ideal candidate for discovering shared neoantigens within the group. We obtained genome sequencing data of breast cancer samples with or without somatic BRCA1 mutations (BRCA1-positive and BRCA1-negative, respectively) from the three public cancer databases; The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and Catalogue of Somatic Mutations in Cancer (COSMIC); and from three studies with whole genome/exome sequencing data of samples with germline BRCA1 mutations. Data were analyzed separately within the same database/cohort. We found PIK3CA H1047R, E545K, E542K, and N345K recurrently in BRCA1-negative groups across all databases, whereas recurrent somatic mutations in BRCA1-positive groups were discordant among databases. For germline BRCA1-mutated breast cancer, TP53 R175H was unanimously the most frequent mutation among the three germline cohorts. Our study provides lists of potential shared neoantigens among BRCA1-related breast cancer, which may be used in developing off-the-shelf neoantigen-based vaccines.


Introduction
The cancer vaccine is an approach to cancer immunotherapies involving the activation of T cells against tumor antigens and can be used as a therapeutic or preventive measure for cancer treatment [1]. Antigen targets for cancer vaccines are (1) over-expressed self-antigens known as tumor-associated antigens (TAAs); (2) cancer/testis antigens, which are exclusively expressed in reproductive tissues; and (3) neoantigens, which are exclusively expressed in tumors [2,3]. Therefore, the ideal target for cancer vaccines is neoantigens because their absence in normal tissue mitigates the chance of autoimmune attack [4]. Neoantigen-based cancer vaccine manufacture begins with the identification of somatic mutations by comparing the tumor genome to normal tissues within the same individuals. Somatic mutations that are found uniquely in tumors are then assessed for antigenicity by computerized algorithms and are validated in vitro/in vivo before being administered to patients [5][6][7]. Although the personalized cancer vaccine approach has been shown to elicit immune responses in various cancers, complex and completely individualized manufacturing processes result in high treatment cost and delayed treatment availability, preventing treatment access for all patients [8]. To overcome this problem, off-the-shelf cancer vaccines targeting neoantigens that are "common" or "shared" among groups of cancer patients may help reduce the cost and time of access to the cancer vaccine. In this study, we aimed to identify shared (or common or public) neoantigens found recurrently in BRCA1-related breast cancer patients that may be used as target neoantigens for shared vaccine development.
Breast cancer is the leading cause of death by cancer among women (approximately 700,00 deaths; 15.5% of total women cancer deaths) and the most commonly diagnosed cancer among all cancers and sexes, with an estimated 2.3 million new cases in 2020 (11.7% of total cases) [9]. Breast cancer affects women worldwide with similar incidence and mortality rates [9]. Approximately 1-4% of all breast cancer cases are BRCA1-related [10,11]. We chose to identify shared neoantigens in BRCA1-related breast cancer because of its characteristic mutational signatures suggesting a pattern in mutational events within the group [12,13]. Additionally, BRCA1-mutated breast cancer also shares phenotypic similarities such as morphology, molecular subtype, and responsiveness to poly-(adenosine diphosphate-ribose) polymerase inhibitors (PARPi) treatment. To further illustrate, studies have found approximately 70% of BRCA1-mutated breast cancer to exhibit a basal-like pattern in molecular subtype compared to 20% in BRCA1-wild-type breast cancer; 57-68% of BRCA1-mutated breast cancer exhibit triple-negative breast cancer (TNBC) in surrogate subtype compared to 13% in BRCA1-wild-type breast cancer, and a 50-79% response to PARPi compared to 10-33% in BRCA1-wild-type cases [14][15][16][17][18]. Because of these similarities within BRCA1-mutated breast cancer, we hypothesized that some neoantigens may be found recurrently across individuals with BRCA1 mutations and may be used as neoantigens for off-the-shelf cancer vaccines, both for therapeutic purposes for cases with somatic BRCA1 mutations, and for preventive purposes for those with germline BRCA1 mutations.
The concept of shared-antigen cancer vaccines has been investigated, especially during the past decade [19,20]. Recent studies have shown successful treatments using neoantigen-based shared cancer vaccines in some types of cancers such as IDH1 R132H for glioblastoma and KRAS G12D for colon cancer [21][22][23]. Other common neoantigens that have been identified and proposed as targets for cancer vaccines are TP53 R175H and PIK3CA H1047R for gastric cancer, RET M918T for thyroid cancer, and common frameshift mutations for microsatellite instability-high (MSI-H) tumors [24][25][26][27]. Nevertheless, shared target antigens reported in breast cancer, such as HER2 and MUC1, are TAAs, not neoantigens [28][29][30][31]. A study, which recently reported shared neoantigen targets in breast cancer such as PIK3CA H1047R E545K N345K and AKT1 E17K, was conducted on unspecified breast cancer samples [32]. To the best of our knowledge, shared neoantigen targets for BRCA1-related breast cancer have not been reported.
In this study, we proposed potential neoantigen targets that are found commonly in BRCA1-positive, -negative, and germline BRCA1-mutated samples. We included samples from large open-access public cancer genome databases: TCGA, ICGC, and COSMIC to identify top recurrent mutations from which we also predicted antigenicity of encoded proteins. Our study provided lists of potential shared neoantigens among BRCA1-related breast cancer, which may be used for developing off-the-shelf neoantigen-based vaccines, and may reflect different mutational consequences among somatic, germline BRCA1-mutated, and BRCA1-wild-type breast cancer that should be further investigated.

Sample Identification
We searched 3 cancer genome databases: The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Catalogue of Somatic Mutations in Cancer (COSMIC) for sequencing data of breast cancer samples with and without the BRCA1 mutation. Data related to samples from these 3 databases were collected with original informed consent. For TCGA and ICGC databases, the search term "primary site is breast and gene is BRCA1" was used to locate samples with sequencing data on the BRCA1 gene. We found 154 samples on TCGA (data release version: 3 May 2022) and 1852 samples on ICGC databases (data release version: 27 March 2019). Among these samples, 27/154 (17.53%) from TCGA and 106/1852 (5.72%) from ICGC were shown to harbor BRCA1 mutations. For the COSMIC database (data version: 28 May 2021), we found 245 samples using the search terms "gene: BRCA1, tissue: breast". Among the samples with BRCA1 mutations, we only included samples with "pathogenic" or "likely pathogenic" BRCA1 somatic mutations by the criteria of the American College of Medical Genetics and Genomics (ACMG) 2015 in our study [33]. Exclusion criteria were (1) samples with single gene target sequencing, and (2) samples with known germline BRCA1 mutations reported by previous studies [12,34,35] (Supplementary Table S1). The samples identified by previous studies as germline BRCA1 mutations were to be analyzed separately. With these criteria, we were able to include 12, 15, and 66 samples from TCGA, ICGC, and COSMIC databases, respectively, into our study. These samples with "pathogenic" or "likely pathogenic" somatic BRCA1 mutations will be referred to in this paper as "BRCA1-positive". Samples that were known to have wild-type BRCA1 sequence were classified as "BRCA1negative". All sample IDs included in this study can be found in Supplementary Table S1 for germline BRCA1-mutated samples and Supplementary Table S2 for BRCA1-positive and -negative samples.

Data Analyses
For TCGA and ICGC databases, BRCA1-positive and -negative samples were grouped into separate folders using web interfaces. Then, data on all mutations of samples in each group were downloaded onto a local computer for analysis. For the COSMIC database, all mutation data were downloaded from the "All mutations in census gene" and "Non-coding variant" sections. All mutations were assessed for variant type, single nucleotide substitution classification, coding-region variant classification, and variant count per sample by simple counting. Data of top recurrent mutated genes and top recurrent somatic mutations were obtained by counting and were also available via the TCGA and ICGC web interfaces. Top recurrent somatic mutations on coding regions from all databases were also assessed by simple counting.

Antigenicity Prediction of Recurrent Somatic Mutations
We assessed antigenic potentials of the top recurrent somatic mutations by calculating binding affinity between mutated epitopes and Major Histocompatibility Complex (MHC) class I. Binding affinities were calculated using NetMHCpan and The Immune Epitope Database (IEDB) algorithms [36][37][38]. We used 2 prediction methods, NetMHCpan BA 4.1 and NetMHCpan EL4.1. MHC Class I/peptide pairs with stronger than moderate binding affinity (IC50 < 500) will be determined as possibly antigenic. Such mutations will be considered antigenic only when neoepitopes are available. We included 145 MHC class I alleles in this study to cover 99% of all MHC class I in the worldwide population [38].
Binding affinities between neoepitopes and MHC class II were calculated using the recommended method by the Immune Epitope Database (IEDB) algorithms. If none of these methods were available for the allele, NetMHCIIpan 4.0 was used. Twenty-seven MHC class II alleles were used in this study to cover 99% of all MHC class II in the worldwide population [39]. We selected 12-mer to 18-mer peptides. The IEDB recommended that selections were based on a consensus percentile rank of the top 10%. Alternatively, peptides with binding affinity to MHC Class II at less than 1000 nM were classified into binders. Antigenic epitopes were generated exclusively from mutated proteins and cannot be found in wild-type proteins.

Allele and Haplotype Frequency Calculations
Average allele frequency of each MHC class I was calculated using data from the Allele Frequency Net Database (https://www.allelefrequencies.net) (access date: 1 Feb 2022) [40]. The data inclusion criteria for allele frequency calculation were set at "all population, all countries, all sources of the dataset, all regions, all ethnicities, all type of study, gold population standard only", with gold qualities being defined as (1) having allele frequency totaled or close to 1, (2) the sample size of more than 50, and (3) the frequencies of four-digit resolution. Average haplotype frequencies were calculated using data also from the Allele Frequency Net database. The data inclusion criteria for haplotype frequency calculation were set at "all population, all countries, all sources of the dataset, all regions, all ethnicities, all type of study, sort by haplotype, 2 loci test".

Statistical Analysis
Mann-Whitney U test was performed to compare variant counts between BRCA1-positive and -negative samples within the same databases using IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp. p < 0.05 was considered statistically significant.

Characteristics of BRCA1-Positive and BRCA1-Negative Samples
Characteristics of breast cancer samples with pathogenic or likely pathogenic BRCA1 mutations (referred to as BRCA1-positive) are shown in

Mutational Landscapes of BRCA1-Positive and -Negative Samples
To compare variant types between BRCA1-positive and -negative samples, we classified all somatic mutations, from both coding and non-coding regions, into single nucleotide variation (SNV), deletion (DEL), insertion (INS), and others. "Others" is comprised of mutations that could not be classified into any of the previous categories. Insertion and deletion sizes from ICGC were less than 200 bp. We found that SNV was the majority of BRCA1-positive ( (Figure 2A,B) (Figure 2A). However, the second most abundant SNV in BRCA1-negative samples were varied among databases with C > G, C > A, and T > C being the second most abundant SNVs in TCGA (3807/12,894, 29.52%), ICGC (543,067/2,569,039, 21.13%), and COSMIC (1725/9445, 18.26%), respectively ( Figure 2B). This showed that SNV was the major variant type with C > T mutations being the most abundant SNV in both BRCA1-positive and -negative samples.  Next, we classified the mutations at coding regions into seven types: missense, synonymous, nonsense, frameshift deletion, in-frame deletion, frameshift insertion, and inframe insertion. We found that missense mutation was the most abundant mutation found at coding regions in both BRCA1-positive ( Figure 3A,B). However, COS-MIC databases showed frameshift deletions to be the second most abundant mutation in both BRCA1-positive and BRCA1-negative samples ( Figure 3A,B). Taken together, a missense mutation is the most abundant variant type in coding regions in both BRCA1-positive and -negative samples. To assess the tumor mutational burden of BRCA1-positive and -negative samples, we compared coding-region non-synonymous variant count per sample from each group within the same database. We excluded samples that were analyzed by target sequencing in the COSMIC database. Therefore, 20 BRCA1-positive and 79 BRCA1-negative from COSMIC were analyzed. We found that the median of total variant counts in BRCA1-positive samples was 83.50 (Q1 59.75-Q3 140.25), 95.00 (Q1 66.00-Q3 140.00), and 6.50 (Q1 4.00-Q3 8.00) in TCGA, ICGC, and COSMIC databases, respectively ( Figure 4A). For BRCA1negative samples, the median of variant counts in TCGA, ICGC, and COSMIC databases was 39.00 (Q1 22.00-Q3 65.00), 36.00 (Q1 23.00-Q3 62.00), and 2.00 (Q1 1.00-Q3 2.00), respectively ( Figure 4B). We found significant differences in total non-synonymous variant counts between BRCA1-positive and BRCA1-negative samples in all databases (2-tailed pvalue: <0.001 in TCGA, <0.001 in ICGC, and 0.001 in COSMIC) ( Figure 4C). We also found significant differences in both SNV and indel counts between BRCA1-positive and -negative groups in all databases (p-value < 0.001, <0.001, and <0.001 in TCGA, ICGC, and COS-MIC for SNV count and 0.019, 0.032, and 0.031 in TCGA, ICGC, and COSMIC for the indel count) (Supplementary Table S3).

Recurrent Somatic Mutations in BRCA1-Positive and BRCA1-Negative Breast Cancer Samples
To propose candidate neoantigens for generalized breast cancer vaccine development, in both BRCA1-positive and BRCA1-negative samples, we looked at the top recurrent somatic mutations exclusively at coding regions ( Figure 6A,B). We found missense PIK3CA H1047R, E545K, N345K, and E542K consistently across all databases in BRCA1negative samples. PIK3CA H1047R was reported in BRCA1-negative samples in all databases (TCGA 15/123, 12.19%; ICGC 198/1714, 11.55%; and COSMIC 488/3158, 15.45%) but was reported at various prevalence in BRCA1-positive samples (TCGA: 2/12, 16.67%; ICGC: 0/15, 0%; and COSMIC: 3/66, 4.54%). PIK3CA E545K and PIK3CA N345K were identified in BRCA1-negative samples with comparable prevalence in all databases (E545K 5.69-7.63%; N345K 1.69-3.25%) but were rarely found in BRCA1-positive samples (E545K 0-1.51%; N345K 0%), whereas PIK3CA E542K was identified at comparable prevalence in both BRCA1-positive (4.54-6.67%) and -negative (2.43-4.78%) in all databases. We found no dominant recurrent mutations among BRCA1-positive samples; however, PIK3CA H1047R, PIK3CA E542K, TP53 Y220C, and TP53 R196* were found among the top recurrent mutations in more than two databases ( Figure 6A). This showed that while the recurrent coding-region mutations are still inconclusive for BRCA1-positive samples, PIK3CA H1047R, E545K, N345K, and E542K were consistently identified across all databases for BRCA1-negative samples and, therefore, may be used as common target neoantigens for the BRCA1-negative breast cancer vaccine. To investigate the overlap of the top 20 recurrent mutations between BRCA1-positive and -negative groups, we used a Venn diagram to visualize the overlap of somatic mutations between the two groups within the same database (Supplementary Figure S1). The data showed that PIK3CA H1047R, TP53 R196*, and GATA3 D336Gfs*17 to be overlapped in both subgroups in TCGA; PIK3CA E542K in ICGC; and PIK3CA H1047R, E542K, E545K, and GATA3 D336Gfs*17 in COSMIC. Although these mutations were shown to be overlapped by the Venn diagram, the recurrent rates can be different between the subgroups (Supplementary Figure S1). These findings also confirm that varied neoantigen sets may be necessary to provide good coverage for subgroups with different BRCA1 statuses.
Targeting multiple neoantigens at a time may provide more coverage for a generalized breast cancer vaccine. Therefore, we also calculated the cumulative coverage of recurrent mutations in both types of samples ( Figure 6C,D). In BRCA1-positive samples, we found that the top 5 somatic mutations as displayed in the graph can cover 41.66% of all samples and the top 12 somatic mutations to cover 83.33% of the sample in TCGA; the top 6 somatic mutations to cover 26.67% and the top 17 to cover 66.67% of the samples in ICGC; and lastly, the top 5 to cover 19.69% and the top 20 to cover 24.24% of the samples in COSMIC ( Figure 6C). For BRCA1-negative samples, we found that the top 5 somatic mutations as displayed in the graph to cover 26.01% of all samples and the top 20 somatic mutations to cover 33.33% of the sample in TCGA; the top 5 somatic mutations to cover 25.37% and the top 20 to cover 36.34% of the samples in ICGC; and lastly, the top 5 to cover 33.91% and the top 20 to cover 45.18% of the samples in COSMIC ( Figure 6D).

Recurrent Somatic Mutations in Germline BRCA1-Mutated Breast Cancer Samples
Next, we aimed to identify candidate neoantigens to be used as a preventive cancer vaccine for germline BRCA1 carriers. We identified three studies that reported next-generation sequencing data on samples confirmed to be germline BRCA1 mutations.  [35]. The summary of sample characteristics is shown in Supplementary Table S1. Seventy-eight total samples were obtained and analyzed separately. We found that missense TP53 R175H was consistently the most frequent somatic mutation in all studies and accounted for 6.45%, 11.53%, and 9.   (Figure 7). This information may indicate unique mutational consequences among samples with germline BRCA1 mutations, non-specific BRCA1-positive mutations, and no BRCA1 mutations.

Predicted Antigenicity of Top Recurrent Mutations
We investigated the antigenic potential of the peptides from top recurrent somatic mutations by calculating their epitopes' binding affinities with MHC class I. The calculated binding affinities of top recurrent mutations were obtained from NetMHCpan BA 4.1 and NetMHCpan EL 4.1. Candidate recurrent somatic mutations are predicted to be antigenic when their binding affinity was lower than 500 nM (Table 3). We found that most recurrent mutations are predicted to be antigenic except for TP53 R175H and TP53 R196*. For binding affinities with MHC class II, we found that most recurrent mutations were predicted to be antigenic except for PIK3CA E542K and TP53 R196*, which resulted in a premature stop codon (Supplementary Table S4). A combination of PIK3CA H1047R, E542K, E545K, and N345K can cover 10.75% (10/93) of BRCA1-positive samples and 27.50% (1374/4995) of BRCA1-negative samples. It is noteworthy that the combination of these recurrent neoantigens can cover minimal to no samples with germline BRCA1 mutations.

Discussion
Personalized neoantigen-based cancer vaccines have revolutionized personalized medicine; however, its highly-individualized manufacturing process may hinder its accessibility by all. Shared neoantigen vaccines have certain advantages over personalized approaches in cases where time and resources are limited and patients have such aggressive diseases that they cannot afford delayed treatment. In this study, we identified recurrent somatic mutations and potential shared neoantigen candidates in BRCA1-positive,negative, and germline BRCA1-mutated breast cancer. We analyzed mutation data of the breast cancer samples with and without "pathogenic" or "likely pathogenic" BRCA1 mutation as determined by ACMG 2015 criteria (referred to as BRCA1-positive and -negative, respectively) that were available on three public cancer databases: TCGA, ICGC, and COSMIC). We also reported mutational landscapes and frequently mutated genes in BRCA1-positive and -negative samples.
Our findings on mutational landscapes correspond with previous studies. Our results showed that SNV is the most abundant variant type with C > T being the most abundant SNV type in both BRCA1-positive and -negative groups. This corresponds with the study results by Zhou and colleagues in 5,991 unspecified breast cancer samples [32]. Mutational burden represented by total variant counts showed differences between BRCA1positive and -negative among all databases with the BRCA1-positive group harboring more SNVs and indels compared to the BRCA1-negative group. These results also correspond with findings by Nolan and colleagues who analyzed the tumor mutational burden by WES and found a marked enrichment of missense and indel mutations in BRCA1-mutated triple-negative breast cancer compared to the non-BRCA1-mutated group [41].
TP53 and PIK3CA were found mutated in 30.24-37.39% and 30.08-39.48% of all BRCA1-negative samples, respectively. These mutational frequencies are similar to the findings in unspecified breast cancer samples (TP53 37%; PIK3CA 38%) [32,42]. In the BRCA1-positive group, however, TP53 was found to be the top mutated gene with much higher frequencies of 59.09-75.00% of all BRCA1-positive samples. This corresponds with a previous finding that loss-of-function TP53 is required for efficient tumor development in targeted BRCA-null mice [43]. Interestingly, PIK3CA mutations, which were the top mutation in BRCA1-negative and unspecified breast cancer, were found in the lower frequencies of 13.33-16.67% and were not among the top mutated genes in the BRCA1-positive group. This also correlates with the frequency of PIK3CA mutations in TNBC (16%), which is lower than in the hormonal positive subgroups (HR+/HER2 (42%) and HER2+ (31%)) [42]. Taken together with the fact that most BRCA1-mutated samples are TNBC (57-68%) [15,16], and TP53 and PIK3CA mutations are often mutually exclusive [32], the relationship between hormonal status (TNBC), TP53, PIK3CA, and BRCA1 mutation status may need to be further explored.
We reported PIK3CA H1047R, PIK3CA E545K, PIK3CA E542K, and PIK3CA N345K to be the top recurrent somatic mutations in BRCA1-negative samples. The mutation list is highly similar to the finding in breast cancer of unspecified genotypes (PIK3CA H1047R E545K N345K, and AKT1 E17K) [32]. On the other hand, the recurrent somatic mutations in BRCA1-positive samples (PIK3CA H1047R, TP53 Y220K, TP53 R196*, and PIK3CA E542K) may not be reproducible across all databases, possibly due to low sample counts.
However, the germline BRCA1-mutated samples showed TP53 R175H to be the only top recurrent mutation unanimously across all three cohorts, and they are not found among the top mutations in BRCA1-positive or -negative samples. Therefore, the same set of neoantigens that cover one BRCA1 status may not cover the others. The different sets of recurrent somatic mutations among BRCA1-positive, -negative, and germline BRCA1 samples reported in this study may reflect different mutational events among these groups.
Most recurrent somatic mutations were predicted by NetMHCpan algorithms to be antigenic and their peptides can be presented by both MHC-class I and class II, except for TP53 R175H; however, it was shown to be antigenic by the in vitro studies [44,45]. Other predicted neoantigens identified in this study such as PIK3CA H1047R were reported to elicit both CD4+ and CD8+ responses in vitro [46], whereas PIK3CA E545K and E542K were not found to be presented by MHC class I in vitro [46,47]. Therefore, in vitro validation of the predicted neoantigens reported by this study is necessary. The lack of in vitro testing is another limitation of this study in addition to the small sample size in the BRCA1-positive group.
This work provides a foundation for developing off-the-shelf neoantigen-based vaccines for BRCA1-related breast cancer, yet there are several critical steps prior to achieving the vaccine development goal. In our future study, we plan to validate the antigenicity of the candidate neoantigens in vitro by testing their ability to bind MHC molecules and their ability to elicit a T cell response. We plan to proceed to in vivo studies with the validated neoantigen candidates by using those shared neoantigens to immunize mouse models with BRCA1-mutated breast cancer. The vaccinated mice will be evaluated for antibody levels and T-cell activities against vaccinated neoantigens, and the clinical outcomes such as tumor size, progression, metastasis, and survival, will be compared to the unvaccinated control group.

Conclusions
Our study identified common somatic mutations that are predicted to be immunogenic in BRCA1-related breast cancer (BRCA1-positive, -negative, and germline BRCA1 mutations). We reported PIK3CA H1047R, PIK3CA E545K, PIK3CA E542K, and PIK3CA N345K to be the top recurrent mutations in BRCA1-negative samples across all databases. On the other hand, PIK3CA H1047R, TP53 Y220K, TP53 R196*, and PIK3CA E542K, which were found recurrently in BRCA1-positive samples, were not consistent among the databases. TP53 R175H was the top recurrent somatic mutation found uniquely in the germline BRCA1-mutated group. Collectively, our study provided lists of candidate neoantigens that may be used to develop off-the-shelf cancer vaccines for BRCA1-related breast cancer patients or as a preventive cancer vaccine in BRCA1-mutated carriers. However, in vitro validations of the candidates' antigenicity and assessment of their ability to immunize and regulate cancer progression in vivo are the critical next steps, which we plan to include in our future study.

Supplementary Materials:
The following supporting information can be downloaded at: www.mdpi.com/article/10.3390/vaccines10101597/s1, Figure S1: Overlapping top 20 recurrent somatic mutations identified in BRCA1-positive and BRCA1-negative groups; Table S1: Sample characteristics of germline BRCA1-mutated breast cancer studies and sample IDs of the samples included in this study; Table S2: BRCA1-positive and -negative sample IDs from TCGA, ICGC, and COSMIC databases; Table S3: Median and p-value of non-synonymous variant from Mann-Whitney U test of all databases; Table S4: Binding prediction results between epitopes of recurrent somatic mutation and MHC class II.
Funding: This research and the APC were funded by the grants for Development of New Faculty Staff, Ratchadaphiseksomphot Endowment Fund, Chulalongkorn University.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was provided in the original studies of which the data are deposited in public databases.

Data Availability Statement:
The datasets generated during and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors have no relevant financial or non-financial interests to disclose.