Genetic variants in more than 10 genes are known to confer moderate to high risks to breast and/or ovarian cancers (BC/OC) and explain 5% to 10% of all breast cancers and approximately 20% of all ovarian cancers [1
]. Most of these genes encode for tumor suppressor proteins that play a role in repair of DNA double-strand (DSB) breaks by homologous recombination (HR). In addition to the main breast cancer genes, BRCA1
[MIM #113705] [3
] and BRCA2
[MIM #600185] [4
], inactivating mutations in ATM
[MIM #607585], BARD1
[MIM #604373], PALB2
[MIM #610355], RAD51C
[MIM#602774], and RAD51D
[MIM#602954], among others, confer risk to breast and/or ovarian cancer [1
Loss-of-function variants in RAD51C
increase the risk of breast and ovarian cancer, but the same has not been demonstrated for other RAD51
paralogs, or for RAD51
itself that plays a major role in HR repair [7
]. Likewise, bi-allelic RAD51C
) deleterious variants have been found in Fanconi Anemia patients [12
]. RAD51C participates in the recruitment of RAD51 to DNA damage sites and the stabilization of RAD51 nucleofilaments as part of the BCDX2 complex (RAD51B, RAD51C, RAD51D, and XRCC2). It is also involved in the resolution of Holliday junctions interacting with XRCC3 resulting in the CX3 complex, and recently, it was demonstrated that RAD51C interacts directly with PALB2, a key protein in HR [13
]. Furthermore, RAD51C has been reported to facilitate ATM-dependent CHEK2 phosphorylation, allowing the activation of CHEK2, another important regulator of the cellular response to DNA damage [18
The detection of germ-line pathogenic variants in these cancer susceptibility genes can contribute to improve the prevention, therapy, and surveillance of breast/ovarian cancer patients, as well as to a better knowledge of BC/OC genetics. Unfortunately, a large fraction of variants is classified as variants of uncertain clinical significance (VUS). Since the association with cancer risk is unknown for these variants, this complicates genetic counseling and the clinical management of patients. Multifactorial likelihood approaches, together with functional studies of variants, can facilitate their interpretation [20
]. Variants of disease-genes are typically assessed according to their predicted impact on protein translation, so protein truncating variants (frameshift and nonsense) are usually classified as damaging variants. However, variants might also have an impact on RNA expression and, e.g., disrupt transcription initiation, miRNA regulation, or splicing [23
Pre-mRNA splicing is an essential gene expression mechanism, whereby introns are excised and exons are consecutively joined to produce the mature mRNA. The splicing motifs include the core consensus sequences (5′ and 3′ splice sites -5′SS and 3′SS-, the polypyrimidine tract, and the branchpoint) and exonic or intronic splicing enhancers and silencers [28
]. Variants in these cis
-motifs may lead to abnormal events such as exon skipping, intron retention, inclusion of pseudoexons, or the use of alternative splice sites [29
]. These generate aberrant transcripts which may be associated with a genetic disorder [21
]. According to the Human Gene Mutation Database (accessed on 27 November 2019) around 9% (23354/269419) of reported disease-causing mutations impair splicing, although some authors suggested that up to 50% of all human disease mutations impair splicing [33
Given the low precision of in silico analysis tools that predict the impact of candidate variants on RNA splicing, the exact consequences of these genetic changes must be verified in functional assays [35
]. The most suitable method to determine whether a particular variant affects splicing is the direct analysis of blood RNA from heterozygous carriers (either patients or healthy relatives), although access to blood RNA samples is not always feasible in the diagnostic routine [37
]. Even if available, the assessment of the transcripts derived from the variant allele is hampered by the presence of the wild type one. One possible alternative strategy is to use minigene assays, which have been proven to represent a robust tool for assessing the pathogenicity of potential spliceogenic variants [40
Multigene panel testing is a cost- and time-effective option to evaluate genes and genetic variants that may be associated with a risk of cancer, and is becoming widely used in clinical practice. Our study was conducted in the context of the BRIDGES project (Breast Cancer After Diagnostic Gene Sequencing; https://bridges-research.eu/
) where a panel of 34 known or suspected breast cancer susceptibility genes were sequenced in 60,466 cases and 53,461 controls [44
]. Here, we bioinformatically analyze 40 variants from the intron/exon boundaries of the RAD51C
gene identified in BRIDGES subjects. Twenty variants are selected and functionally tested by minigene assays.
Massive parallel sequencing of breast and/or ovarian cancer genes has allowed the genetic testing of thousands of patients in a high throughput and cost-effective strategy. The goal of the BRIDGES initiative was to firmly establish the breast cancer association of genes tested by commercial multigene panels with the narrowest confidence intervals of risk estimates currently available. BRIDGES analyzed 34 known or suspected BC genes that were sequenced in 60,466 patients and 53,461 controls [44
]. Nonsense, frameshift, and ±1, 2 splice site variants (sometimes collectively referred to as protein truncating variants or PTVs) are usually assumed to be pathogenic or likely pathogenic. This assumption might work well for certain epidemiological studies but cannot be taken for granted in the clinic (e.g., spliceogenic variants, including ±1, 2 splice site variants, are not necessarily pathogenic, as they may cause in-frame alterations preserving function). Many other variants (e.g., rare missense changes) are considered VUS, due to their unknown impact on gene function and disease risk [48
]. In fact, clinical management of VUS carriers (and non-carrier relatives) is complex, since risk evaluation is solely based on family history [49
gene was one of the 34 genes analyzed by BRIDGES given its role in breast and ovarian cancer [6
]. A statistically significant association for PTVs has been found for ER-negative breast cancer and breast and ovarian cancer [44
]. In this work, we have carried out the most comprehensive splicing study of germline variants of RAD51C
to date. Forty variants located within the intron/exon boundaries were selected and analyzed by MES or NNSplice. In keeping with the standards indicated in Materials and Methods, 20 candidate variants were chosen (Table 1
) for subsequent RNA assays.
In the absence of patient RNA, splicing reporter minigenes provide a straightforward and robust method for the initial characterization of putative spliceogenic variants for several reasons. The assay (i) uses a simple and clean analysis of a single mutant allele; (ii) is performed in a cell type relevant for the disease; (iii) circumvents the NMD interference with the use of an inhibitor; (iv) uses a single construct for testing multiple variants, among other benefits of this technology. Here, we envisioned a construct that contained a synthesized insert with seven (exons 2–8) out of the nine exons of the RAD51C
gene, so that all the selected variants (Figure 2
A) could be evaluated in one single minigene.
Remarkably, all but one variant disrupted splicing, underlining the specificity of our criteria. MES or NNSplice predicted correctly an effect on RNA splicing (either splice-site disruptions or significant score reductions) in 19 variants (Table 1
). Only one variant, c.146-3C > T, did not alter splicing, indeed, the MES score was just slightly reduced (−8.5%) because the most frequent −3 nucleotide (C) is substituted by the second most frequent one (T). However, other -3 non-conservative changes in which the nucleotide substitution was different, such as c.905-3C > G and c.966-3C > A, caused total or almost total splicing disruptions. Likewise, a double effect was precisely predicted by MES for c.405-6T > A: 3′SS disruption and generation of a strong de novo 3′SS 4-nt upstream that, in fact, was mainly used by the splicing machinery (▼(E3p4), 95.2%). MES did not identify the exon 8 donor site, although the NNSplice did. In this case, both +5 variants (c.1026 + 5_1026 + 7del and c.1026 + 5G > T) totally disrupted splicing without any trace of the full-length isoform. Conversely, another +5 variant (c.705 + 5G > C) yielded 51.6% of the full-length isoform with a relatively low MES decrease (−20.8%). It is also worthy to mention that c.571 + 4A > G slightly reduced the MES score (−22.5%) but the resultant mutant donor site was still strong (MES = 8.1). However, this change induced an almost complete aberrant splicing with a residual amount of the full-length transcript (5.4%). Finally, the different splicing outcomes of the two changes at the same position, c.706-2A > C and -2A > G, should be highlighted (Table 1
). Variant c.706-2A > C mainly caused the use of a cryptic 3′SS 10-nt downstream (Δ(E5p10); 91.4%), while c.706-2A > G mainly generated Δ(E5) (65.4%) but also Δ(E5p10) (33.5%). However, MES scores of the cryptic 3′SS of both changes (3.3 vs. 3.2) were low and not significantly different. One possible explanation could be that the c.706-2A > C is a purine to pyrimidine change that would strengthen the polypyrimidine tract of the internal cryptic acceptor site 10-nt downstream (used in Δ(E5p10)), whereas c.706-2A > G (purine to purine) would not.
Given this and the unpredictability of splicing outcomes, with 35 different transcripts, RNA assays are strongly recommended to investigate the impact of genetic variants on splicing. Fluorescent capillary electrophoresis of the RT-PCR products also offered high resolution and sensitivity, being capable of distinguishing isoforms that differ only in a few nucleotides [53
], such as the full-length and ▼(E8p3)-a,b,c transcripts that just contain a 3-nt insertion.
Interestingly, 12 transcripts (▼(E2q27), Δ(E2q175), Δ(E2q22), Δ(E2), Δ(E3), Δ(E4), Δ(E4_5), Δ(E5), Δ(E7), Δ(E7_8), Δ(E8), and▼(E8p3)) had been previously characterized as naturally occurring isoforms of RAD51C
], suggesting that physiological alternative events may somehow predict variant splicing profiles [55
]. Moreover, minigene assays are capable of mimicking pathological patterns of variants. Thus, minigene experiments reproduced previous results of patient RNA assays of several variants inducing very similar or even identical outcomes: c.571 + 4A > G (Δ(E3)) [58
], c.706-2A > G (Δ(E5)) [59
], c.905-2_905-1del (Δ(E7)) [60
], and c.1026 + 5_1026 + 7del (Δ(E8)) [61
]. Moreover, variants c.837 + 2T > C and c.905-3C > G/c.905-2A > C mimicked previous results of c.837 + 1G > A and c.905-2A > G of the same splice sites, respectively [62
]. Finally, variants c.404G > C/G > T, at the same position as c.404G > A, promoted the use of the same cryptic splice site 27-nt downstream (▼(E2q27)) of the canonical donor site [64
]. Altogether these results lend support to the reproducibility of the minigene approach. However, while in patient samples, the major and apparently unique aberrant transcript of each of the variants c.571 + 4A > G, c.706-2A > G, and c.1026 + 5_1026 + 7del was the main outcome in minigene assays (Δ(E3)-76.5%, Δ(E5)-65.4%, and Δ(E8)-78.0%, respectively), the minor minigene transcripts were not detected in patient RNAs (Table 1
). These slight variations may be due to several reasons, including: (i) tissue-specific alternative splicing, since patient RT-PCRs are usually performed from blood RNA; (ii) the high sensitivity of the fluorescent fragment analysis, which allows the identification of rare isoforms; (iii) the use of NMD inhibitors in minigene experiments (patient samples are not usually NMD-inhibited), which improves the detection of low-abundant PTC-transcripts; (iv) the interference of the wild type allele in patient samples; (v) the high transcription rate triggered by a strong minigene SV40 promoter [38
]. Likewise, the wild type construct did not exactly replicate the splicing profile of MCF-7 or control breast samples that showed minor alternative transcripts (Figure 1
C). Hence, other factors should be considered, such as the absence of the natural genomic context in the minigene that actually contains shortened introns 2, 3, 4, 5, 6, 7, and 8 (Supplementary Figure S1
). Therefore, we might speculate that the absence of putative regulatory intronic elements and the natural exon/intron architecture might somewhat influence splicing outcomes of the wild type and mutant minigenes [65
Clinical Interpretation of Variants
The clinical interpretation of variants cannot be done solely on the basis of the functional data presented in this manuscript. From a clinical perspective, the data presented here are to assist in classifying genetic variants. Yet, the analysis of spliceogenic variants is an especially challenging and laborious mission. The presence of numerous RAD51C
abnormal transcripts and the production of several transcripts by many variants are proofs of this arduous undertaking. From a simple functional viewpoint, the biological indicators of pathogenicity of a particular variant are the strong reduction of the expression of wild type transcript and the presence of severe splicing anomalies that are predicted to result in protein truncation or loss of critical protein domains. On this basis, 18 variants with severe splicing anomalies (Table 1
) should be classified as deleterious or likely deleterious.
However, more complex and comprehensive guidelines have been developed for the clinical interpretation of variants, such as those of the ACMG-AMP [66
]. Here, we propose a clinical classification of our findings on the basis of these guidelines. Overall, we think that our ACMG/AMP-like classification of 20 RAD51C
pre-selected variants based on minigene data is rigorous, with most variants placed in the pathogenic/likely pathogenic category, but highlighting as well up to four variants (c.705 + 5G > C, c.966-3C > A, c.966-2A > G, c.966-2A > T) that despite being spliceogenic, require further studies to be definitely classified.
We would like to highlight as well that, at some point, our classification is based on decisions not necessarily shared by other experts in the field (e.g., replacing in silico predictions by functional evidence rather than combining both, see rationale below and in Supplementary Methods
). For that reason, others may propose a different clinical classification. In turn, this highlights a relevant issue in variant classification, namely, the lack of standardization.
Accordingly, our minigene-based ACMG/AMP-like classification approach (Table 2
) was not intended to produce a definitive (i.e., authoritative) clinical classification of these variants (a prerequisite for that will be the completion of the ClinGen expert panel adaptation of the ACMG/AMP rules to RAD51C
), but rather to highlight the complexity of determining the appropriate aggregate strength of combining predictive and functional splicing types of evidence into the ACMG/AMP classification framework without introducing inconsistences into the system [67
Internal inconsistences that we have identified in the ACMG/AMP framework are: (i) GT-AG ± 1, 2 variants producing PTC-NMD transcripts being more easily classified as pathogenic (PVS1 + PS3 = Pathogenic) than nonsense/indels variants introducing equivalent PTC-NMD alterations (PVS1 + ? = Pathogenic), and (ii) GT-AG ±1, 2 variants being more easily classified as pathogenic that other spliceogenic variants producing identical RNA outputs (PVS1+ PS3 = Pathogenic vs. PS3 + PP3 = Uncertain Significance). Further, we think that a system granting likely pathogenic classification for rare GT-AG ± 1, 2 variants (PM2 + PVS1 = likely pathogenic) fails by discouraging RNA analyses.
In the present study, we propose addressing these issues by a somewhat radical approach: replacing in silico predictions by functional evidence (rather than combining both). We think that this approach: (i) avoids the internal inconsistences already mentioned, and (ii) recognizes the fact that predictive and functional splicing pieces of evidence are not truly independent from each other. Implicitly, the ACMG/AMP classification framework assumes that each piece of evidence is independent [68
], an assumption hardly met by the predictive and functional criteria as most functional analyses are performed in pre-selected variants on the basis of bioinformatics predictions such as the present study.
The ClinGen CDH1
expert panel has proposed to use PVS1_Strong (rather than PVS1) for GT-AG ± 1, 2 variants and combine these with RNA (PS3) or association (PS4) data to reach a pathogenic classification [69
]. In a second iteration of the rules (www.clinicalgenome.org/affiliation/50014/
), the authors refine the approach by stating that for PVS1_Strong variants (GT-AG ± 1, 2), PS3_moderate (rather than PS3) should be applied.
While the suggestion of “downgrading” the loss-of-function prediction for GT-AG ± 1, 2 variants (and encouraging RNA analyses) is appealing to us, the approach does not eliminate internal inconsistences for GT-AG ± 1, 2 vs. other PTC-NMD variants (PVS1_Strong + PS3_moderate = Likely Pathogenic vs. PVS1only = uncertain significance) and does not even address the issue for spliceogenic variants other than GT-AG ± 1, 2. Further, nothing is said about the appropriate strength of combining computational and functional splicing data if the evidence codes go in opposite directions.
In our approach, the computational evidence does not contribute to the final clinical classification of functionally validated spliceogenic variants, but we do acknowledge a fundamental role for these predictions in selecting and prioritizing variants for subsequent splicing analyses. Indeed, we recommend running bioinformatic splicing predictions for all genetic variants regardless of their nature and/or location (i.e., nonsense, in-frame, and frameshift indels and synonymous, non-synonymous, and intronic variants). Further, once a variant is selected for splicing analysis, the predictions have a role in designing and/or validating the corresponding assays. For instance, a negative experimental result (no splicing effect) in a variant with strong computational evidence might points towards a sub-optimal experimental design (e.g., multi-exon skipping is missed due to wrong selection of primers). Further on, a positive result (splicing alteration) for a variant with no strong computational evidence may suggest that it is not the presumed variant under investigation but another variant in cis (e.g., a deep intronic variant) that is causing the splicing alteration.
The “quality control” role of computational evidences is probably more relevant for assays performed in RNA from carriers than in minigene-based assays (e.g., in the latter approach there is no doubt about the variant under investigation). Yet, we argue that the concordance with computational evidence (as observed in the present study) is also relevant to consider minigene outputs strong (or very strong) evidence towards pathogenicity.
Ultimately, validation of the pathogenicity will need to be based on the observed risk associated with the variants—either through case-control or family-based studies. It will be extremely challenging to evaluate risk for individual variants, since they are very rare, but it is possible in principle to evaluate the classification system as a whole. Furthermore, in BRIDGES, these spliceogenic variants account for 44.9% of all patients carrying a pathogenic/likely pathogenic variant (data not shown), indicating that a high proportion of RAD51C
breast cancer risk-associated alleles displays splicing defects, as previously described for BRCA1