Minigene Splicing Assays Identify 20 Spliceogenic Variants of the Breast/Ovarian Cancer Susceptibility Gene RAD51C

Simple Summary Loss-of-function variants of the RAD51C gene are known to confer a risk of breast and ovarian cancers. In this study, we analyzed the impact of RAD51C variants on splicing, a highly regulated gene expression step by which introns are removed and exons are sequentially joined. Exon recognition is guided by specific sequences, the 3′ and 5′ splice sites, which define the exon boundaries. Variants of these sequences of susceptibility genes may lead to aberrant splicing and abnormal transcripts that may trigger a disease. Splicing can be tested using a biotechnological tool called minigenes, which mimic the human gene of interest. Thus, we checked 20 RAD51C splice-site variants using the minigene mgR51C_ex2-8. We found that they all disrupted the splicing mechanism, and 16 variants could be classified as likely pathogenic. Our findings are clinically actionable, and variant carriers may benefit from tailored prevention protocols and therapies. Abstract RAD51C loss-of-function variants are associated with an increased risk of breast and ovarian cancers. Likewise, splicing disruptions are a frequent mechanism of gene inactivation. Taking advantage of a previous splicing-reporter minigene with exons 2-8 (mgR51C_ex2-8), we proceeded to check its impact on the splicing of candidate ClinVar variants. A total of 141 RAD51C variants at the intron/exon boundaries were analyzed with MaxEntScan. Twenty variants were selected and genetically engineered into the wild-type minigene. All the variants disrupted splicing, and 18 induced major splicing anomalies without any trace or minimal amounts (<2.4%) of the minigene full-length (FL) transcript. Twenty-seven transcripts (including the wild-type and r.904A FL transcripts) were identified by fluorescent fragment electrophoresis; of these, 14 were predicted to truncate the RAD51C protein, 3 kept the reading frame, and 8 minor isoforms (1.1–4.7% of the overall expression) could not be characterized. Finally, we performed a tentative interpretation of the variants according to an ACMG/AMP (American College of Medical Genetics and Genomics/Association for Molecular Pathology)-based classification scheme, classifying 16 variants as likely pathogenic. Minigene assays have been proven as valuable tools for the initial characterization of potential spliceogenic variants. Hence, minigene mgR51C_ex2-8 provided useful splicing data for 40 RAD51C variants.


Introduction
A core set of ten genes significantly increases the lifetime risk of developing breast and/or ovarian cancer (BC/OC), as well as other types of cancer [1]. RAD51C (MIM#602774) loss-of-function variants are significantly associated with BC risk (OR = 1.93), while this association is even greater with estrogen-receptor-negative BC, triple-negative BC, and ovarian cancer (OR = 3.99, 5.71, and 5.59, respectively) [2][3][4]. The main isoform of RAD51C comprises nine exons and encodes a protein essential for DNA repair by homologous recombination. Biallelic RAD51C deleterious variants are also implicated in Fanconi anemia (FANCO) [5,6].
Next-generation sequencing (NGS) technology has allowed great progress in breast/ovarian cancer research and diagnostics but has also increased the number of variants of uncertain clinical significance (VUS), whose role in the disease needs to be clarified. This sort of variant hampers the genetic counseling of patients and decision making in the clinical setting [7]. According to the ClinVar database, around 51% of reported RAD51C variants are VUS (https://www.ncbi.nlm.nih.gov/clinvar/?term=RAD51C%5Bgene%5D, (accessed on 21 February 2022)).
The reclassification of VUS is essential to ensure appropriate patient care, and functional assays provide critical information for their interpretation [8][9][10][11]. RNA splicing is one of the gene expression steps that may be impaired by genetic variants [12,13]. This process is controlled by a wide array of motifs, such as the consensus 3 and 5 splice sites (3 SS and 5 SS, respectively), the polypyrimidine tract, the branchpoint, and other splicing regulatory elements [14], which represent targets for potential spliceogenic variants. Alterations that result in RNA mis-splicing produce anomalous transcripts and proteins that can trigger a genetic disorder [15,16]. Indeed, a high proportion of VUS of the BRCA1, BRCA2, MLH1, and MSH2 genes induce splicing disruptions [12,17,18].
In a previous study, we studied 20 RAD51C variants from the large-scale sequencing project BRIDGES (http://bridges-research.eu/, accessed on 21 February 2022) in a splicingreporter minigene that contains exons 2 to 8 [19]. In this study, we bioinformatically analyzed 141 variants reported in the ClinVar database and selected another 20 variants for testing by minigene assays. Finally, we suggested a tentative clinical classification of the BRIDGES variants as per ACMG/AMP (American College of Medical Genetics and Genomics and the Association for Molecular Pathology)-based guidelines.

Ethics Approval
Ethical approval for this study was obtained from the Ethics Committee of the Spanish National Research Council (CSIC, 28/05/2018).

Annotation of DNA and RNA Variants and Transcripts
DNA variants, selected from the ClinVar database (https://www.ncbi.nlm.nih.gov/ clinvar/?term=RAD51C%5Bgene%5D (accessed on 21 February 2022)), and alterations at the RNA level were annotated according to the Human Genome Variation Society (HGVS) guidelines (http://varnomen.hgvs.org/, accessed on 21 February 2022) on the basis of the RAD51C GenBank sequence NM_058216.3. For clarity, transcripts were also described using a simplified annotation that combines the following symbols [20]: ∆ (skipping of exonic sequences); (inclusion of intronic sequences); E (exon); p (acceptor shift); and q (donor shift). In addition, the last two symbols (p and q) are always followed by the number of nucleotides inserted or deleted from the 3 SS or 5 SS, respectively. For example, ∆(E2p3) refers to the use of an alternative acceptor site 3 nt downstream of exon 2.
Bioinformatics analysis was performed using the Max Ent Scan (MES) algorithm of the R package SpliceSites version 1.0.0 (https://www.bioconductor.org/packages//2.13/bioc/  [21,22] (Supplementary  Table S1). Possible harmful variants were selected according to the following criteria: (i) reduction in MES score by at least 40% (67 variants met this condition); (ii) only one variant per splice-site position unless other events, such as the creation of de novo splice sites or the activation of cryptic ones, were predicted; (iii) prevalence in the ClinVar database, so only variants with at least two records were chosen; and (iv) variants without published reports of splicing assays. Taken together, we finally selected 20 potential spliceogenic variants (totaling 65 ClinVar records) distributed throughout the seven exons of the minigene mgR51C_ex2-8 (Supplementary Table S1). SpliceAI (https://spliceailookup. broadinstitute.org/, accessed on 21 February 2022) was also used to predict possible splicing outcomes [22].
The 20 selected variants were incorporated into the wild-type (wt) minigene mgR51C_ex2-8 with the QuikChange Lightning Kit (Agilent, Santa Clara, CA, USA), following the manufacturer's instructions and using the primers indicated in Supplementary Table S2. All constructs were confirmed by sequencing (Macrogen, Madrid, Spain).
RNA was purified using the Genematrix Universal RNA Purification Kit (EURx, Gdansk, Poland), with on-column DNAse I digestion. Reverse transcription of 400 ng of RNA was performed using the RevertAid First-Strand cDNA Synthesis Kit (Life Technologies, Carlsbad, CA, USA), following the manufacturer's instructions and employing the vector-specific primer RTPSPL3-RV (5 -TGAGGAGTGAATTGGTCGAA-3 ). The resulting cDNA was amplified using Platinum-Taq DNA polymerase (Life Technologies, Carlsbad, CA, USA) and primers SD6-PSPL3_RT-FW (5 -TCACCTGGACAACCTCAAAG-3 ) and RTpSAD-RV (CSIC Patent P201231427) (amplicon size: 1062 nt). Samples were subjected to an initial denaturation step at 94 • C for 2 min; followed by 35 cycles of 94 • C/30 s, 60 • C/30 s, and 72 • C/(1 min/kb); and a final extension step at 72 • C for 5 min. RT-PCR products were sequenced by Macrogen (Madrid, Spain), which allowed the characterization of the main variant-induced transcripts. Minor transcripts were annotated according to fluorescent fragment electrophoresis size data (see below).
To quantify the relative amounts of each PCR product, semi-quantitative fluorescent RT-PCRs were carried out in triplicate using a FAM-labeled primer (RTpSAD-RV for minigene cDNA and RTR51C_ex9-RV for cell cDNA) and 26 PCR cycles [26]. Fluorescent products were run with the LIZ-1200 size standard at the Macrogen facility (Seoul, Korea) and analyzed using Peak Scanner software V1.0 (Life Technologies). Only peak heights ≥200 RFU (relative fluorescence units) were considered, and mean peak areas of each transcript and standard deviations were calculated. For clarity, the full protocol is schematized in Supplementary Figure S1.

ACMG-AMP Clinical Classification of RAD51C Genetic Variants
We classified 20 RAD51C genetic variants according to ACMG/AMP-based guidelines [27]. We followed a recently proposed ACMG/AMP point system, a Bayesian framework that outperforms the original classification guidelines and allows for increased flexibility and accuracy in combining different ACMG/AMP criteria and strengths of evidence [28,29]. In this framework, point-based variant classification categories are defined as follows: pathogenic (P) ≥ +10; likely pathogenic (LP) +6 to +9; variant of uncertain significance (VUS) 0 to +5; likely benign (LB) −1 to −6; and benign (B) ≤ −7.
To deal with complex readouts producing ≥2 transcripts (e.g., a RAD51C variant producing two aberrant transcripts, or a leaky variant producing aberrant and full-length transcripts), we developed several ad hoc rules that take into consideration the coding potential of each individual transcript and its relative contribution to the overall expression to reach the appropriate PVS1_O or BP/_O evidence strength. In brief, for each complex readout, we applied the following algorithm: (i) deconvolute mgR51C readouts into individual transcripts; (ii) apply ACMG/AMP evidence classifications to each individual transcript; (iii) produce an overall PVS1_O (or BP7_O) code strength based on the relative contribution of individual transcripts/evidence to the overall expression. Thus, if pathogenic supporting transcripts contribute ≥90% to the overall expression, the PVS1_O_ code is applied (if different transcripts support different pathogenic evidence strengths, the lowest strength contributing >10% to the overall expression is selected for overall evidence strength). Similarly, the BP7_O_ code is applied if benign supporting transcripts contribute ≥90% to the overall expression (if different transcripts support different pathogenic evidence strengths, the lowest strength contributing >10% to the overall expression is selected for overall evidence strength). If neither pathogenic nor benign supporting transcripts contribute ≥90% to the overall expression, the splicing assay is considered to provide no evidence in favor of, or against, pathogenicity. Recently, we used a similar approach to deal with complex PALB2/ATM minigene readouts [20,30].
As already justified in previous studies by our group [19,20,25], once experimental splicing data were available, splicing predictive codes PVS1 and PP3 did not contribute to our final classification. Similarly, in HBOPC_ATMv1 specifications, functional splicing codes replace rather than combine with predictive splicing codes.

In Silico Analysis
The ClinVar database contains 1316 variants reported for the RAD51C gene, and 141 of them are located at exon/intron boundaries. These variants were bioinformatically analyzed with MES according to the standards indicated in the Materials and Methods section (Supplementary Table S1). Twenty variants from exons 2 to 8 were selected for functional assays (Table 1, Supplementary Table S1, Figure 1a Ten of these selected variants were predicted to impair the acceptor site and another ten were expected to impact the donor site. Seven variants (c.146-4_146-2del, c.405-1G>C, c.571+1del, c.705+3A>G, c.706-1G>T, c.905-3_906del, and c.966-1G>C) were predicted to impair the SS and simultaneously create new SSs or strengthen nearby cryptic ones, according to MES. In addition, according to spliceAI, four variants were predicted to promote the use of cryptic splice sites (c.404+2T>C, c.404+3A>G, c.904G>A, and c.904+1G>T).

Transcript Analysis and ACMG/AMP-Based Interpretation
Semi-quantitative fluorescent RT-PCR revealed 27 different aberrant splicing events, including 2 minigene FL transcripts (wt and c.904G>A) (Figure 3, Supplementary Table  S3). Nineteen of them could be characterized, and the remaining eight uncharacterized transcripts appeared in low proportions (≤4.7%) and represented, at most, 6.2% of the overall minigene expression (Table 1). A high-resolution image of the fluorescent fragment electrophoresis is illustrated in Figure 1c

Transcript Analysis and ACMG/AMP-Based Interpretation
Semi-quantitative fluorescent RT-PCR revealed 27 different aberrant splicing events, including 2 minigene FL transcripts (wt and c.904G>A) (Figure 3, Supplementary Table S3). Nineteen of them could be characterized, and the remaining eight uncharacterized transcripts appeared in low proportions (≤4.7%) and represented, at most, 6.2% of the overall minigene expression (Table 1). A high-resolution image of the fluorescent fragment electrophoresis is illustrated in Figure 1c, where transcripts with small size differences (i.e., 1, 3 nt) can be distinguished. Alternative site usage was the most frequent splicing event; specifically, four aberrant transcripts used cryptic 3 SS (∆(E2p3), ∆(E3p7), ∆(E5p10), and (E8p3)), and six used alternative 5 SS ( (E2q27)-a, (E2q27) Of the 19 characterized transcripts, 14 introduced premature termination codons (PTC; PTC transcripts), and of these, 10 were predicted to be degraded by the nonsense-mediated decay pathway (NMD; PTC-NMD transcripts), which is considered convincing evidence of deleteriousness (Supplementary Table S3). Following the ACMG/AMP's proposed PVS1 decision-tree rationale [31], all PTC-NMD transcripts (Table 1) were classified as very strong evidence of pathogenicity ( Table 2). The four PTC non-NMD transcripts, (E6q4)-a (p.Gly302SerFs*47), (E6q4)-b (p.Gly302ValFs*47), ∆(E7) (p.Glu303TrpFs*41), and ∆(E8) (p.Arg322SerFs*22), target RAD51C regions critical for protein function. According to the PVS1 decision-tree rationale, these four PTC transcripts should be considered as strong evidence of pathogenicity. However, these alterations remove β strands 6 to 9 (7 to 9 in the case of ∆(E8)) and the nuclear localization signal [32,33]. The integrity of the β sheet is important for maintaining the overall fold of the RAD51C protein and the interaction with RAD51B, so alterations to any β strand of RAD51C should be considered deleterious [33]. Further, structural features (the order of the β strands in space is not the same as their order in sequence) predict that proteins lacking any single β strand would fail to form the β sheet, resulting in the collapse of the protein core and the misfolding of the protein [33]. Moreover, the missense variant p.Arg312Trp (β strand 6) has been shown to impair RAD51C function [34]. Considering these data altogether, we decided to upgrade the pathogenic evidence strength from strong to very strong (Tables 1 and 2). In keeping with this, various PTC variants are classified as pathogenic/likely pathogenic by multiple submitters (no conflicts) in ClinVar. Of the 19 characterized transcripts, 14 introduced premature termination codons (PTC; PTC transcripts), and of these, 10 were predicted to be degraded by the nonsense-    (Tables 1 and 2). ∆(E2p3) is a physiological alternative isoform [35] that deletes the conserved amino acid Glu49 (Supplementary Figure S2). Lacking any evidence other than a deleterious PROVEAN score (-10.29), we determined that this in-frame transcript provides pathogenic evidence with supporting strength (as per PP3). The predicted protein product of ∆(E5) (p.Arg237_Val280del) deletes the Walker-B domain (β strand 4) and β strand 5. In addition, 26 out of the 44 amino acids encoded by this exon are conserved in vertebrates (Supplementary Figure S2) [19]. Finally, the exon 5 missense variant c.773G>A (p.Arg258His) is classified as likely pathogenic in ClinVar, because it was found as a biallelic mutation in multiple Fanconi anemia patients of a single family [6]. Altogether, these observations suggest that ∆(E5) is a loss-of-function transcript that should be catalogued as very strong evidence of pathogenicity (P_VS, +8 points, Table 2). Finally, (E8p3) removes the conserved amino acid Arg322 (β strand 7) and inserts Ser and Thr (p.Arg322delinsSerThr) (Supplementary Figure S2). Based on a deleterious PROVEAN score (−11.94), we determined that this in-frame transcript provides pathogenic evidence with supporting strength (as per PP3).
Finally, one mgFL-transcript carried the missense variant c.904G>A/p.Gly302Arg, where Gly302 is conserved in vertebrates but does not affect a known protein functional domain. In addition, the metapredictor REVEL does not support the pathogenicity of this missense variant (0.5) [36]. Another nucleotide substitution (c.904G>C), resulting in the same missense variant (p.Gly302Arg), is considered a VUS in ClinVar (REVEL = 0.5).
We classified all 20 RAD51C variants according to ACMG-AMP-based classification guidelines, integrating mgR51C data as PVS1_O/BP7_O evidence codes (as indicated above) and the rarity code PM2 (as indicated in Materials and Methods, Table 2). The PM3 evidence (in trans with a pathogenic variant in a recessive disorder) did not contribute to the final classification. Unsurprisingly (FANCO is an extremely rare FA complementation group) [37], none of the tested variants have been identified in Fanconi anemia patients (ClinVar and Global Variome share LOVD databases and literature searches). Similarly, the BS2 evidence (in trans with a pathogenic variant in a healthy individual) did not contribute to the final classification of our tested variants. Finally, we decided that some pathogenic (PS2, PM1, PM6, PP2, PP4) and benign (BP1, BP3, BP5) codes were not applicable to the classification of RAD51C variants.

Discussion
About 40% of all variants reported in the ClinVar database are variants of uncertain significance (https://www.ncbi.nlm.nih.gov/clinvar?term=%22clinvar_all%22[Filter], accessed on 21 February 2022). Variants of uncertain significance pose a challenge for genetic counseling testing as they are considered a negative result, and so the risk assessment of VUS-carrier patients is exclusively based on family history [8,38]. A significant fraction of VUS impair pre-mRNA splicing, which makes transcript analysis a mandatory step for determining their pathogenicity [39].
Conversely, many splicing variants are classified as likely pathogenic or pathogenic because they target the canonical ±1, 2 splice-site positions. While it is true that most of these variants will impact splicing, the resulting alteration is not necessarily pathogenic. For example, variants affecting the 3 SS of RAD51C exon 8, such as c.966-2A>G (functionally analyzed in our previous RAD51C study [19]) and c.966-1G>C (studied here), are classified as likely pathogenic by ClinVar because they alter the -2 and -1 positions, respectively. Nevertheless, these variants cause a 3 nt insertion (one amino acid) with an unknown impact on protein function, and so they are classified as VUS (Table 2). This observation underlines the importance of the functional testing of suspicious splicing variants [14,40].
Here, we focused on potentially spliceogenic RAD51C variants reported in the ClinVar database. Functional studies were performed using a hybrid minigene (mgR51C_ex2-8) that has proven to be a powerful and reliable tool for testing variant-splicing outcomes in the absence of patient RNA [19]. The major advantages of minigene-based assays are: (a) no need for patient samples; (b) no interference from the wt allele, as occurs in patient RNA, so all the observed transcripts are generated by the variant; (c) assays can be performed on disease-relevant cell types; and (d) a single construct allows the study of multiple variants. Indeed, this RAD51C minigene has allowed the functional analysis of a total of 40 variants to date (Table 3), but the functional analysis of any candidate variant located at exons 2 to 8 would be possible. Finally, the high sensitivity and resolution of the fluorescent fragment electrophoresis, which facilitated the detection of rare transcripts and resolved small size differences between them, is also worth mentioning.
We also focused our attention on substitutions of +2T>C, by means of which a canonical GT 5 SS is converted into an atypical GC 5 SS that accounts for less than 1% of human donor sites [47]. It has been reported that about 15-18% of +2T>C changes retain the activity of the donor site [48], inducing between 1 and 84% of full-length transcripts. Remarkably, neither of the two genetic alterations that introduce a cytosine at position +2 (c.404+2T>C and c.837+2T>C) use the de novo atypical GC dinucleotide (Table 2), e.g., the PALB2 variant c.48+2T>C [19,20]. Conversely, we previously showed that PALB2 c.108+2T>C generated an active GC-5 SS that produced 85% of full-length transcripts [20]. This feature may be related to the high sequence conservation of the other splice-site positions (CAG|GCAAGT). On the other hand, c.966-1G>C mainly induced the use of an alternative 3 SS ( (E8p3), 79%), which we had also detected in variants c.966-3C>A, c.966-2A>G, and c.966-2A>T, though in lower amounts (6-11%) [19]. As indicated above, the 3 nt insertion (E8p3) represents four different transcripts and four different protein products (Arg deletion/SerThr insertion, Arg duplication, Arg deletion/SerGly insertion, and Arg deletion/SerTrp insertion), which hinder transcript interpretation to an even greater extent.

Conclusions
We tested a total of 40 RAD51C variants in the minigene mgR51C_ex2-8, of which 39 impaired splicing and 36 were associated with severe splicing aberrations (Table 3).
Thirty-one variants were classified as likely pathogenic/pathogenic as per ACMG/AMPbased guidelines, while nine were catalogued as VUS. Moreover, according to ClinVar records of 34 reported variants (including those of our previous study) [19], the mgR51C readouts changed the clinical interpretation of 12 variants: 9 VUS were upgraded to likely pathogenic and 3 LP variants were downgraded to VUS. Both changes are critical for genetic counseling and decision making in the clinical setting, reaffirming the value of minigene assays. Finally, it is critical to define the minimal amount of RAD51C required to maintain gene function. Hence, it is conceivable that the variants with the vast majority of inactivating transcripts, such as c.966-3C>A, c.966-2A>G, and c.966-2A>T (>86% of PTC transcripts), might be reclassified as likely pathogenic or pathogenic.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14122960/s1, Figure S1: Workflow of the minigene protocol, Figure S2: Alignment and amino acid conservation of the RAD51C protein, Table S1: Bioinformatics analysis of RAD51C variants with Max Ent Score, Table S2: Mutagenesis primers for RAD51C variants, Table S3: Transcript annotation according to HGVS guidelines.