Pathogenic Variant Spectrum in Breast Cancer Risk Genes in Finnish Patients

Simple Summary The Finnish population has evolved through multiple reductions in the population size, which have caused decreased genetic diversity in the population. This may affect the risk variant spectrum in diseases such as breast cancer (BC) so that a few variants may cover most of the pathogenic variation found in the risk genes. A dozen recurrent pathogenic variants have been identified in the moderate-risk BC susceptibility genes in Finnish BC patients. To evaluate the spectrum and frequency of the risk variants more comprehensively, we have, here, studied all variants in 1769 patients and copy number changes in 1511 patients both in the moderate-risk genes as well as in the high-risk BRCA1 and BRCA2 genes. While the overall pathogenic variant frequency was comparable to other populations, just a few variants accounted for most of the pathogenic burden in the risk genes. These results could be utilized in population screening strategies in Finland. Abstract Recurrent pathogenic variants have been detected in several breast and ovarian cancer (BC/OC) risk genes in the Finnish population. We conducted a gene-panel sequencing and copy number variant (CNV) analysis to define a more comprehensive spectrum of pathogenic variants in BRCA1, BRCA2, PALB2, CHEK2, ATM, BARD1, RAD51C, RAD51D, BRIP1, and FANCM genes in Finnish BC patients. The combined frequency of pathogenic variants in the BRCA1/2 genes was 1.8% in 1356 unselected patients, whereas variants in the other genes were detected altogether in 8.3% of 1356 unselected patients and in 12.9% of 699 familial patients. CNVs were detected in 0.3% of both 1137 unselected and 612 familial patients. A few variants covered most of the pathogenic burden in the studied genes. Of the BRCA1/2 carriers, 70.8% had 1 of 10 recurrent variants. In the other genes combined, 92.1% of the carrier patients had at least 1 of 11 recurrent variants. In particular, PALB2 c.1592delT and CHEK2 c.1100delC accounted for 88.9% and 82.9%, respectively, of the pathogenic variation in each gene. Our results highlight the importance of founder variants in the BC risk genes in the Finnish population and could be used in the designing of population screening for the risk variants.


Introduction
With a lifetime risk of 13%, breast cancer (BC) is the most frequently-diagnosed cancer in Finnish women [1]. Several high-or moderate-penetrance genes have been determined to be clinically valid for the prediction of BC risk [2,3]. Pathogenic variants in BRCA1 lead to high cumulative lifetime risks of BC and ovarian cancer (OC) (72% and 44%), and analyzed the SNPs and short indels along with the CNVs detected in these genes in our patient and control series. In this study, we analyzed PALB2 among the moderate-risk genes.

Gene-Panel Sequencing
The DNA samples were sequenced as part of a Breast Cancer Association Consortium's (BCAC) panel analysis of 34 confirmed and suspected BC susceptibility genes. The genotyping and variant annotation process has been described by Dorling et al. [3].

Single-Nucleotide Variants and Short Indels in Moderate-Risk Genes
To cover a comprehensive spectrum of damaging variants in the moderate-risk genes in the studied Finnish BC patients, we examined all pathogenic variants as well as variants of unknown significance identified in the gene-panel sequencing. We focused on variants with a carrier frequency of ≤2% in the population controls.
As pathogenic variants, we selected all putative LoF (pLoF) variants, defined as stopgain, frameshift, and essential splice site variants. Additionally, we selected missense and in-frame indel variants that were interpreted as pathogenic or likely-pathogenic in ClinVar [30]. We evaluated the evidence available for these variants in ClinVar and included the variants that were likely to cause a moderately elevated cancer risk.
We examined the missense and in-frame indel variants of unknown significance in search of other potentially pathogenic variants. We tested the association between the variants and cancer risk with Fisher's exact test using the R environment for statistical computing (version 4.0.3) [31] and two-sided p values. We excluded the variants with benign or likely-benign interpretations (including conflicting interpretations) in ClinVar. We selected the variants that were predicted to be deleterious either by Helix [32] or by CADD [33] (phred ≥ 25) and further annotated them with protein domain information from UniProt [34].

Single-Nucleotide Variants and Short Indels in BRCA1/2 Genes
The previous estimate of the BRCA1/2 carrier frequency in the unselected BC patients was derived from recurrent founder variants [24]. To examine all pathogenic variants found in the unselected BC patients, we selected the pLoF, missense, and in-frame indel variants from the gene-panel sequencing data of the 1356 unselected patients in this study. Of these, we selected the variants that were interpreted as pathogenic or likely-pathogenic in ClinVar as well as previously unreported, likely-pathogenic pLoF variants.

Copy Number Variant Analysis
The CNV data were collected as a part of CNV analysis by BCAC [19], which used the Illumina iCOGS genotyping array with 211,155 probes and OncoArray with 533,631 probes [11,12]. The CNV calling was carried out using CamCNV pipeline as described in detail by Dennis et al. [19,35]. The authors included CNV segments covered by 3 to 200 probes.
We used Ensembl data (release 104) [36] through BioMart [37] and Bedtools (version 2.30.0) [38] to connect the CNV segments to genes and transcripts. As we did not confirm the exact cut-off points of the CNVs, we refer to the CNVs on exonic level in this study. We treated the CNVs leading to the same exonic change as one CNV. The analysis included data from 1137 of the unselected and 612 of the familial patients, with an overlap of 238 patients between the groups, as well as 1025 of the controls.

Multiplex Ligation-Dependent Probe Amplification
All CNVs were validated with the multiplex ligation-dependent probe amplification (MLPA) technique [39]. Details on the used MLPA assays (MRC Holland, Amsterdam, The Netherlands) are given in the Supplementary Table S1. The results were analyzed with the Coffalyser.Net software, version 140721.1958 (MRC Holland).

Pathogenic Variants in Moderate-Risk Genes
We identified pathogenic or likely-pathogenic variants including CNVs in 112/1356 (8.3%) unselected BC patients, in 90/699 (12.9%) familial BC patients, and in 42/1112 (3.8%) population controls. All of these variants are presented in Table 1. We observed recurrent variants, defined here as variants that have been found in more than one Finnish BC patient,   3 In all BC patients. 4 Total number of patients after removing the overlap of 286 patients between the familial and the unselected BC series. 5 Individuals with two or more pathogenic variants were counted once in the total frequencies.
Besides the major founder variant c.1592delT, we identified four rare pLoF variants in PALB2. Three variants were each carried by a single patient and one variant was detected in a population control. In CHEK2, the previously reported recurrent variants c.1100delC, c.319+2T>A, and c.444+1G>A covered 80/82 (97.6%) of the pathogenic variation identified in the patients, including one patient who was heterozygous for both c.1100delC and c.319+2T>A. Here, we found three other pathogenic or likely-pathogenic CHEK2 variants. A deletion of exons three and four and a functionally-damaging missense variant c.433C>T p.(Arg145Trp) [40][41][42] were each detected in a single patient. Additionally, another CHEK2 pLoF variant was observed in two population controls.
Similarly to CHEK2, the previously-known recurrent FANCM variants c.5101C>T, c.5791C>T p.(Arg1931Ter), and c.4025_4026del p.(Ser1342Ter) covered 56/58 (96.6%) of the pathogenic variation in this gene among the patients. In addition, we detected one other recurrent FANCM variant, c.1491dup p.(Gln498ThrfsTer7), in two patients. In contrast, we observed no major variants in the ATM gene. The variants that were detected in more than one individual were the previously-reported c.6908dup p.(Glu2304GlyfsTer69) and a functionally defective missense c.7570G>C p.(Ala2524Pro) [43,44], covering 4/9 (44.4%) of the pathogenic variation. Five other ATM variants were found in a single patient each.
In RAD51C, we detected the previously-known CNV duplication, which covered the first seven exons of the gene, in three patients and one novel variant in a single patient. To our knowledge, no pathogenic BARD1 variant has previously been identified in Finnish BC patients; here, we found two BARD1 pLoF variants, each in a single patient. Additionally, we observed two variants in the OC risk gene BRIP1 in one patient and one control each. No pathogenic variants were identified in RAD51D among the individuals included in this study.
Eight patients carried two or more pathogenic variants in the moderate-risk genes (Supplementary Table S2). With an overlap of three patients between the series, 5/1356 (0.4%) unselected and 6/699 (0.9%) familial BC patients had more than one pathogenic variant. Two of the patients were homozygous for CHEK2 c.1100delC.

Missense Variants of Uncertain Significance
We tested the missense and in-frame indel variants for BC association and evaluated them based on pathogenicity interpretations submitted to ClinVar and prediction tools. We detected a nominally-significant statistical association for ATM c.146C>G p.(Ser49Cys), found in 17/1769 (1.0%) patients compared with 3/1112 (0.3%) controls (OR = 3.59 [95% confidence interval 1.03-19.14], p = 0.036); however, this variant is interpreted as benign in ClinVar. Thirty-three missense variants, identified either in patients or controls, passed the selection criteria for potentially pathogenic variants (Supplementary Table S3). All these variants were rare in our BC series and none of them were significantly associated with BC risk (p < 0.05).
Additionally, we found a duplication of exons 62-63 in ATM with a frequency of 12/1511 (0.8%) in patients and 9/1025 (0.9%) in controls; hence, it was likely benign. In BRCA1, one population control had a large duplication that covered exons 1-20 (legacy name exons) as well as a large section upstream of the gene. A third CNV in BRCA1 and two CNVs detected in BRCA2 could not be validated with MLPA and were excluded.

Pathogenic Variant Frequencies in Different Diagnosis Age Groups
We evaluated the frequencies of the pathogenic variants in the unselected series in patients diagnosed with BC at different ages. We observed the variants in the moderate-risk genes in 37/362 (10.2%) patients diagnosed at <50 years of age and in 75/994 (7.5%) patients diagnosed at ≥50 years of age (Supplementary Table S5A

Discussion
We have estimated the prevalence of all pathogenic and likely-pathogenic variants in high-and moderate-risk BC and OC susceptibility genes in Finnish BC patients and controls from the Helsinki region. We observed variants in the PALB2, CHEK2, ATM, BARD1, RAD51C, BRIP1, and FANCM genes in 8.3% of the unselected BC patients and in 12.9% of the familial BC patients. Excluding the variants found in the putative moderaterisk gene FANCM and the OC risk gene BRIP1, the carrier frequency was 5.1% in the unselected BC patients and 10.2% in the familial BC patients. In the BRCA1/2 genes, we identified pathogenic or likely-pathogenic variants in 1.8% of the unselected BC patients.
The overall carrier frequency of pathogenic variants in the validated BC risk genes, observed among the unselected patients, was 6.7%, which is comparable to the results found by other reports. In the BCAC gene-panel sequencing study reported by Dorling et al., about 6.8% of European BC patients had a protein-truncating variant in a BC risk-associated gene, including the BRCA1/2 and TP53 genes [3]. Another large population-based study from the United States reported the frequencies of pathogenic variants identified in BC patients with different ethnicities [5]. In that study, approximately 5.0% of the patients carried a variant in a risk gene. It is also worth noticing that we detected pathogenic variants in the validated BC risk genes in 5.3% and 5.0% of the patients diagnosed with BC at 50 and 60 years of age and over, respectively, which might be missed by strict age-based genetic testing.
While the pathogenic variant spectrum in mixed populations is usually wide, our study highlights the strong founder effects in the moderate-risk genes in the studied Finnish BC patients. PALB2 c.1592delT, CHEK2 c.1100delC, and FANCM c.5101C>T accounted for 88.9%, 82.9%, and 86.2%, respectively, of the pathogenic variation in each gene. Furthermore, the three most common variants in the established risk genes, PALB2 c.1592delT, CHEK2 c.1100delC, and CHEK2 c.319+2T>A, were carried by a notable portion of all patients: 4.1% of the unselected patients and 9.0% of the familial patients. Due to the major recurrent variants, the total frequency of all pathogenic variants in the moderate-risk genes was very similar to the previous estimates analyzing just twelve recurrent variants [25]. We detected new pathogenic variants that were, to our knowledge, previously unreported in the Finnish BC patients, in 1.0% of the unselected patients and in 0.4% of the familial patients. All of these variants were rare and only found in one or two patients each.
Not all previously-known recurrent moderate-risk variants were detected in the current study. We have observed two other recurrent RAD51C variants, c.93delG and c.837+1G>A, each in 0.1-0.2% of familial BC patients [25,49]. Additionally, RAD51D c.576+1G>A has been found in 0.1% of unselected BC patients and in 0.3% of familial BC patients [25,50], whereas, in this study, no pathogenic RAD51D variants were identified. These variants were either not detected by the genotyping and variant calling pipeline or were previously identified due to a larger patient series.
The BRCA1/2 variant frequency was low among the unselected BC patients, with 0.6% and 1.2% carrying a pathogenic or likely-pathogenic BRCA1 and BRCA2 variant, respectively. We have previously identified pathogenic BRCA1 variants in 1.9% and BRCA2 variants in 1.1% of 370 additional unselected BC patients who were not included here in the gene-panel sequencing (Supplementary Table S4) [24,25]. For these groups combined, a total of 0.9% and 1.2% of the patients had a BRCA1 and BRCA2 variant, respectively. The frequencies are in line with other (population-based) studies [3,5]. Unlike in the moderaterisk genes, the pathogenic variant spectrum detected in the high-penetrance BRCA1/2 genes in the Finnish BC families is wide with multiple unique variants [25,48]. Nevertheless, strong founder variants have been identified in the BRCA1/2 genes, especially prominent in BRCA2 [23,25,48,51,52]. Ten recurrent BRCA1/2 variants were detected in the unselected patients in the current study. Haplotype analyses have indicated common ancestors for most of these variants in Finland, with two distinct haplotypes detected in the BRCA2 c.771_775del (previously known as 999del5) carrier families [51,52].
The prevalence of pathogenic CNVs in the BC risk genes has not been explored as extensively as SNPs and short indels; in this study, we investigated the CNVs alongside the other variants. We discovered three likely pathogenic variants, BRCA1 duplication of exon 13, CHEK2 deletion of exons 3-4, and RAD51C duplication of exons 1-7, which altogether were found in 0.3% of both unselected and familial patients. In comparison, Dennis et al. reported CNV deletions in the BC risk genes in 0.5% of a large series of over 86,000 BC patients [19]. However, these frequencies are likely underestimates, as conclusive CNV calling from array data requires higher probe density than that offered by OncoArray and iCOGS [19]. Hence, the pathogenic CNV spectrum and frequency estimates warrant further studies, also, in Finnish patients. The current CNV detection methods are expensive and time-consuming, and CNVs are often not included in gene panels in clinical testing nor in research. The ongoing development of algorithms and tools to call CNVs from the next-generation sequencing data provides the possibility of the routine inclusion of CNVs in gene-panel testing for comprehensive analysis.
Our results suggest that most carriers among the studied Finnish BC patients could be detected by genotyping the recurrent variants. Of the carriers of a BRCA1/2 or a moderaterisk variant, 70.8% and 92.1%, respectively, had a recurrent variant in the present study. While gene-panel sequencing is utilized in clinical testing, our results could be used in the designing of population screening of the BC risk variants in Finland. Combined with common low-risk variants into a PRS, the carriers of moderate-risk variants could be provided with improved personalized risk estimates. Recent studies have indicated that the moderate-risk variant carriers with a high PRS may have a BC risk comparable to the carriers of a high-risk variant, whereas, with a low PRS, the carriers may have their risk reduced to the level of the general population [13][14][15]. These estimates could guide cancer prevention strategies for the risk-variant carriers.

Conclusions
We have estimated the overall prevalence of pathogenic variants in the high-and moderate-risk genes in Finnish BC patients, as well as the contribution of recurrent variants to the pathogenic burden detected in these genes. The combined frequency of the variants was similar to other populations; however, our study highlights the importance of the major recurrent variants in Finnish BC patients, with most of the pathogenic variation resulting from a few variants. Our results are descriptive of the Finnish population and could be utilized in the designing of population screening of the BC risk variants.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. Access to the BCAC data can be applied by contacting the BCAC Coordinator (https://bcac.ccge.medschl. cam.ac.uk).