New Approach for Detection of Normal Alternative Splicing Events and Aberrant Spliceogenic Transcripts with Long-Range PCR and Deep RNA Sequencing

Simple Summary RNA splicing defects, caused by genetic variants, are a common molecular mechanism of disease. To detect variants that cause splicing impairment, mRNA-based studies must be performed. Classical mRNA assays are time-consuming, which is why we have validated a new reliable straightforward approach to detect normal alternative splicing events and also splicing aberrations. Using our approach, we were able to reclassify three variants of uncertain significance in NBN and STK11 genes, which is of great importance for a proper clinical management of the patients. Abstract RNA sequencing is a promising technique for detecting normal and aberrant RNA isoforms. Here, we present a new single-gene, straightforward 1-day hands-on protocol for detection of splicing alterations with deep RNA sequencing from blood. We have validated our method’s accuracy by detecting previously published normal splicing isoforms of STK11 gene. Additionally, the same technique was used to provide the first comprehensive catalogue of naturally occurring alternative splicing events of the NBN gene in blood. Furthermore, we demonstrate that our approach can be used for detection of splicing impairment caused by genetic variants. Therefore, we were able to reclassify three variants of uncertain significance: NBN:c.584G>A, STK11:c.863-5_863-3delCTC and STK11:c.615G>A. Due to the simplicity of our approach, it can be incorporated into any molecular diagnostics laboratory for determination of variant’s impact on splicing.


Introduction
Alternative splicing is a process in which a single gene's pre-mRNA undergoes processing into multiple mature mRNA isoforms. Nearly all human multi-exon genes are involved in alternative splicing. For instance, BRCA1 gene contains 23 exons, but 63 alternative splicing events are produced by wild-type allele [1]. Understanding the naturally occurring alternative splicing isoforms of clinically relevant genes is of great importance for correct interpretation of splicing assays. RNA splicing defects are a common molecular mechanism of disease, and there are studies demonstrating that RNA sequencing (RNAseq) considerably improves diagnostics yield [2][3][4]. In the study by Yamada et al., the authors showed that the detection rate of deleterious variants increased by 19% if combination of exome and transcriptome analysis was performed, compared with exome sequencing alone [2]. Similarly, Karam et al. reported that DNA sequencing (DNAseq) in combination with RNAseq improved clinical management of 1 in 43 patients in hereditary cancer syndromes [4]. Therefore, identification of splicing defects is of high importance to improve patient's management. Unfortunately, conventional RNA-based functional assays, such as minigene splicing assays, direct Sanger sequencing and capillary electrophoresis are labor intensive, difficult to interpret and often inconclusive [5]. An additional drawback of the above-mentioned conventional RNA assays is that the maximal fragment's length suitable for analysis is limited to approximately 1000bp, which makes it impossible to investigate variants detected in long exons such as exon 10 in BRCA1 and exon 11 in BRCA2 gene. Therefore, to capture a variant's complete impact on splicing, whole gene RNA sequencing should be performed. With the development of next generation sequencing, targeted RNAseq or even whole transcriptome sequencing is nowadays technically feasible. However, for small diagnostic laboratories, whole transcriptome sequencing can be financially demanding, and it requires high computational power and storage capacity for RNAseq data analysis [6,7].
Naturally occurring alternative splicing events must be determined by analyzing control samples alongside the patient sample to eliminate the possibility of misinterpreting the variant under investigation as spliceogenic. Previous studies have systematically determined alternative splicing events of genes associated with hereditary breast and/or ovarian cancer, such as BRCA1, BRCA2, PALB2 and STK11 [1,8,9]. However, genes associated with rare syndromes or genes with lower penetrance, e.g., NBN, remain less studied. Therefore, for small diagnostics laboratories dealing with large numbers of unclassified variants, it is crucial to develop a quick, easy, non-laborious and bioinformatically uncomplicated test for detecting splicing defects to minimize the number of variants of uncertain significance (VUS).
Consequently, the main purpose of our article is to describe a simple method for detecting variants that have an impact on splicing. Along with this, for the first time using our method, a list of alternative splicing events of the NBN gene is provided.

Patient Samples
A total of 5 patient blood samples and 6 unrelated control blood samples were collected into Tempus Blood RNA Tube (ThermoFisher, Waltham, MA, USA). Patient samples were carriers of the spliceogenic variants NF1:c.122A>T, NF1:c.7395-17T>G, NBN:c.584G>A, STK11:c.863-5_863-3delCTC and STK11:c.615G>A. Control samples were used for detection of alternative splicing events in STK11 and NBN genes. Total RNA was isolated from whole blood using Tempus™ Spin RNA Isolation Kit (ThermoFisher).
The present study was approved by the Institutional Review Board of the Institute of Oncology Ljubljana (permission no. OIRIEK00937) and by the National Medical Ethics Committee of Republic of Slovenia (permission no. 0120-339/2019/5). Individual patient consent was waived for this study, as it was a retrospective study, the research involved no risk to the subjects, and the institutional informed consent forms for treatment included consent for the use of patient's data, materials and/or test results for research purposes. All procedures followed in the present study were therefore in accordance with the ethical standards of the responsible committees on human experimentation (institutional and national) and the Helsinki Declaration of 1975, as revised in 2013.

DNA Sequencing-DNAseq
DNA sequencing was performed as previously described in Setrajcic Dragos et al., 2019 and Klancar et al., 2020 [10,11]. Control samples harboring only undoubtedly benign variants (described in Supplementary Table S2) in the coding region and ±25 nt of intronic sequence of STK11 and NBN gene were selected for alternative splicing isoform discovery.

RNA Sequencing-RNAseq
cDNA synthesis was performed with SuperScript™ IV VILO™ Master Mix (Ther-moFisher) using 100ng of total RNA. Primers for genes STK11, NF1 and NBN were designed to flank 5 and 3 UTR. cDNA was amplified with long-range PCR using LongAmp ® Taq 2X Master Mix (New England Biolabs) (primer sequences and PCR conditions are described in the Supplementary Table S2).
PCR products were quantified with Qubit (ThermoFisher). Long-range PCR amplicons were used for further library preparation with Nextera XT according to manufactures' instructions (Illumina). The library was quantified with LabChip ® GX Touch™ Nucleic Acid Analyzer (PerkinElmer). The library was paired-end sequenced (2 × 121 cycles) on NextSeq 550 (Illumina).
Raw data files (bcl) were converted to fastq files using bcl2fastq2 tool. FastQC tool was used to determine the quality of NGS data. STAR aligner 2.7.3a was used for alignment of NGS reads to hg19 genome assembly, with the following settings: -outFilterMultimapNmax 2 -outFilterMismatchNmax 20 -chimSegmentMin 0 [12] Samtools was used to create index bam (bam.bai) file [13]. All splicing events were obtained from OutSJ.tab file, produced by STAR. Sashimi plots were created using rmats2sashimi tool. Bioinformatic tools for splicing prediction NNSplice, MaxEntScan, Gene splicer, SpliceSiteFinder-like (included in Alamut visual software) and SpliceAI were used [14].

Alternative Splicing Events Threshold
Junctions covered with a minimum of 20 reads and present in at least two samples or previously published were considered as real splicing junctions. Junctions below the set threshold were regarded as sequencing artifacts and/or biological outliers.

Identification of Splicing Aberrations Caused by Genetic Variant
Genetic variants and alternative splicing events are described following HGVS nomenclature v19.01, where c.1 and r.1 are the A of the ATG translation initiation codon. Reference transcripts NM_000455.4, NM_002485.4 and NM_000267.3 for genes STK11, NBN and NF1 were used, respectively. Alternative splicing isoform was defined as any splice junction not defined in the above-mentioned reference transcripts. Splicing isoforms are described using symbols: ∆ ((partial) exon skipping), (intron insertion), p (acceptor shift) and q (donor shift). If multiple cryptic exon inclusion events occurred within the same intron, subsequent letters were added to the event. For example, if three cryptic exon inclusion evets occurred between exons 4 and 5, we described it as 4A, 4B, 4C.

Results
Here, we present a straightforward 1-day hands-on protocol for detection of splicing alterations in blood with deep RNAseq (cDNA seq), regardless of the gene in question. The test is based on long-range PCR with primers aligning to the 5 UTR and 3 UTR regions of the targeted cDNA. The long PCR amplicon is then fragmented with Nextera transposome and tagged with a universal overhang. Next generation sequencing (NGS) library is further prepared with Illumina's Nextera XT. Schematic representation of the novel RNAseq method is presented in Figure 1.

Method Confirmation-Alternative Splicing Events in STK11 Gene
The first step in our study was to confirm that our new NGS library preparation and bioinformatics pipeline can reliably detect all major splicing events including exonic and intronic splice-site shift, cryptic exon inclusion and (multiple) exon skipping. Therefore, we decided to determine all naturally occurring splicing junctions of STK11 gene expressed in blood and compare our results with those previously identified by Brandão et al., 2019. In order to detect even less expressed events, we were aiming for coverage of canonical splice junctions above 100,000×. We were able to detect 36/38 (95%) of previously reported STK11 splicing junctions, missing one exon skipping and one multi exon skipping event. However, not all previously reported junctions reached our threshold (covered with at least 20 reads and present in at least 2 samples): one junction ∆4-5 was expressed extremely weakly in all 6 samples with the average of 6 reads, whereas junctions 1H, 1I and 7q were expressed in one sample only, but with a considerable number of reads spanning the junction: 116, 175 and 29 reads, respectively. We were unable to detect two previously published junctions of STK11 gene, ∆2-5 and ∆7. Hence, we aligned our data again to the sequence of the two known events and visually inspected the alignment. No reads mapped to those two events, suggesting they were indeed not present in our data. The splicing event ∆7 was however detected in a patient sample, which harbored a leaky splice-site variant STK11:c.863-5_863-3delCTC in intron 6, causing the exon 7 skipping. Four splicing events were predominantly expressed (with a percentage of reads >1%): 1C, 7C, 7D and ∆9q, graphically presented in Figure 2. All four events are predicted to create frameshifts and a premature stop codon, resulting in nonsense-mediated decay or nonfunctional protein. The highest expressed splicing event that can possibly retain protein function was ∆2-3 (0.37%), causing an in-frame deletion of amino acids 98-155. In addition, we detected 18 splicing events that have not been published before ( Table 1), suggesting that our approach can be useful for detection of splicing events in STK11 gene.

Catalogue of Naturally Occurring Splicing Events in NBN Gene
Once the new method was established, we examined the alternative splicing events in NBN gene, with an identical approach ( Table 2). In the previous studies, 10 alternatively spliced isoforms have been identified with RT-PCR [15][16][17]. Using our approach, we were able to identify all 10 previously described alternative splicing events as well as 49 previously undescribed alternative splicing events. To our knowledge, this is the most extensive catalogue of naturally occurring alternative splicing events of the NBN gene. Altogether, we detected 59 alternative splicing events; 11 (∆2q, ∆3-4, ∆4-5, ∆6-7, ∆12, ∆13qA, ∆13, ∆12-13, ∆14, ∆13-14, ∆12-14) were predicted to be in-frame deletions that can possibly rescue the protein function. In-frame deletions of exon 13 (∆13) and exon 12 (∆12) were expressed the highest, with 1.2% and 1%, respectively. Only exons 8, 10 and 11 were not a subject of exon skipping. In NBN gene, we were able to detect two splicing events affecting 3 UTR region, which might actually be alternative 3 UTR isoforms [18].The highest expressed alternative splicing event was cryptic exon inclusion 2 (11.3%), which corresponds to exon 3 in NCBI reference sequence NM_001024688.2. Events present in more than 1% are depicted schematically in Figure 2. All splicing junctions produced by STAR aligner of both studied genes are listed in the Supplementary Table S1.

Detection of Known Spliceogenic Variants
To verify that our method is capable of detecting an abnormal splicing pattern caused by a spliceogenic variant, we used our method to test two variants previously characterized as spliceogenic. Impact on splicing for both variants has been previously established with direct Sanger sequencing and capillary electrophoresis (CE) [10]. Variant 1 is NF1:c.122A>T r.121_204del, which creates an exonic donor shift leading to deletion of 84 bp of exon 2 (∆2q). Assessed by CE, NF1:c.122A>T induced transcript represented 52% in comparison to 48% of full-length transcript. Variant 2 is NF1:c.7395-17T>G r.7394_7395ins7395-16_7395-1, which causes an acceptor shift and retention of last 16 bp of intron 50 ( 50p). Aberrant transcript was present in 17.3% determined by CE. Indeed, our new RNAseq protocol successfully identified the disruption of normal splicing and determined the exact splicing junction in both samples (Figure 3). Additionally, according to the new RNAseq method, the fraction of aberrant transcripts caused by NF1 variants was 45% and 19% for NF1:c.122A>T and NF1:c.7395-17T>G, respectively.

Determination of Spliceogenicity of VUS
Three variants in NBN or STK11 genes, which were bioinformatically predicted to cause splicing impairment, were selected for the analysis: NBN:c.584G>A, STK11:c.863-5_863-3delCTC and STK11:c.615G>A. The results are visualized in Figure 3.
NBN:c.584G>A is located in the ultimate position of exon 5, which is predicted to completely abolish natural donor splice site. We observed strengthening of out-of-frame exon 5 skipping (∆5) in the variant carrier. Analyzing the junction data, ∆5 was present in 30% of junctions in NBN:c.584G>A carrier in comparison to 0.8% in controls. After inspecting the alignment file at the position c.584, only wild-type nucleotide G was detected, implicating that the NBN:c.584G>A variant does not form any full-length transcript. STK11:c.863-5_863-3delCTC variant is located in intron 6 and is predicted to decrease the strength of native acceptor splice site. RNAseq analysis has revealed an abnormal transcript that lacks exon 7 (∆7). However, the out-of-frame ∆7 transcript was minorly expressed (only in 0.9%), implying that STK11:c.863-5_863-3delCTC variant causes low leaky splicing abnormality. The control samples did not harbor ∆7 transcript.
Similarly, STK11:c.615G>A creates minor splicing defect (1.1%) by introducing a de novo acceptor splice site. The variant causes minor frameshift deletion of first 19 nucleotides of exon 5 (∆5p). When inspecting mapped data, mutated A nucleotide at the position c.615 was present in 49% of the reads, which confirms that the splicing abnormality is minor. The ∆5p transcript was not present in the controls.

Discussion
RNA-based experiments are often performed in diagnostics laboratories in order to identify variants that cause RNA splicing impairment [19][20][21][22]. RT-PCR followed by capillary electrophoresis and Sanger sequencing are golden standards for determining variants spliceogenicity. However, such experiments are limited to the location of the variant, requiring multiple PCR reactions for different variants in the same gene. Any laboratory, no matter how big or small, requires a reliable straightforward method to determine variants' effect on splicing for precise variant classification. Here, we present a simple RNAseq approach that efficiently detects splicing junctions, which can be implemented by any laboratory with an access to an NGS instrument.
The first aim of our study was to evaluate if our pipeline is sensitive enough to detect previously determined naturally occurring alternative splicing events of STK11 gene. We were able to detect 95% of all splicing events and furthermore detect 18 previously undetected splicing events. One previously described event ∆7 was not detected by our analysis in the control samples included in the study [9]. Nevertheless, we were able to detect the identical event ∆7 in a patient with a rare leaky splice-site variant located in intron 6 of STK11 gene (STK11:c.863-5_863-3delCTC) demonstrating that the assay and the pipeline are able to detect such a splicing event, but the transcript was not present in our data. An additional previously described transcript that was not present in our dataset was ∆2-5. These two events may be population specific, as they were detected in all four samples in a study by Brandão et al., 2019, but were not seen in a single sample in our study (Slovenian population). Notably, we were able to detect all types of splicing events: exon skipping, multiple exon skipping, donor/acceptor shift, cryptic exon inclusion and mixed splicing events. Once the method was established, we characterized alternative splicing patterns of the NBN gene. This is the first time, to our knowledge, that the naturally occurring splicing events of NBN gene were characterized in depth. Ten previously identified alternative splicing events were also detected by our approach [15][16][17]. This catalogue of alternatively spliced transcripts (Table 2) is an important asset for further characterization of possible spliceogenic variants. Interestingly, exons 8 (amino acids 299-332), 10 and 11 (amino acids 375-615) were not subjected to alternative exon skipping. It might be that these exons are essential for protein function. Indeed, the nibrin protein was shown to interact with mTOR/Rictor/SIN1 complex at the amino acid residues 221-402 [23].
Crucial for any RNAseq experiment is its ability to accurately detect splicing impairment [24]. To test that our assay can detect abnormal splicing, two well-studied splice altering NF1 variants were examined. Our assay correctly identified abnormal splicing junctions that arose due to damaging NF1:c.122A>T and NF1:c.7395-17T>G variants. Moreover, the percentage of aberrant transcript determined by CE and RNAseq was similar for both variants, meaning the novel RNAseq method can reliably quantify the aberrant versus normal transcript. NF1:c.7395-17T>G variant produced 19% of aberrant transcript, which might be due to partial degradation of truncated mRNA by nonsense-mediated decay pathway.
Additionally, our method was used to determine splicoegenicity of three variants of uncertain significance. One variant NBN:c.584G>A was shown to completely disturb mRNA splicing by inducing out-of-frame exon 5 skipping and was therefore reclassified as likely pathogenic (ACMG/AMP criteria applied: PS3, PM2, PP3). Two variants STK11:c.863-5_863-3delCTC and STK11:c.615G>A were determined to cause minor leaky splicing, as they were expressed in extremely low fractions. Moreover, the carriers of both variants did not have any clinical characteristics of Peutz-Jeghers syndrome, which is caused by STK11 pathogenic variants. Both variants were therefore reclassified as likely benign (ACMG/AMP criteria applied: BS3, BP5, PM2).
Here, we show that our assay can indeed detect altered transcripts, both complete splicing aberrations and leaky splicing, and can be therefore used as a complementary test in molecular diagnostics laboratories to characterize variants' effect on splicing. Our approach might be used as an alternative method to targeted RNA sequencing or whole transcriptome sequencing when examining one gene of interest.
The main advantage of our approach compared with the targeted RNAseq is that there is no need to design enrichment probes. Designing enrichment probes can be challenging especially for poorly researched genes [25,26]. Furthermore, targeted RNAseq assays are frequently designed and validated for a specific gene panel, which makes it difficult to add or remove genes of interest [26]. An additional advantage of our assay in comparison to targeted or transcriptome sequencing is the possibility to study a single gene of interest, achieving a higher coverage crucial for detection of events expressed in lower fractions. Our assay can be customized for nearly any gene of interest, which makes it highly suitable for laboratories dealing with rare genetic syndromes. The method is limited to fresh tissue samples or cell cultures, since the RNA has to be of high quality so that it is amplifiable with long-range PCR. An additional limitation is the size of the cDNA, which needs to be amplifiable with PCR. In this study, the maximal length of cDNA, which was successfully amplified and analyzed, was 12kb (NF1 gene).
The limitation of our approach in detecting natural events of NBN gene was that DNA sequencing of control samples only included exon regions and 25 bp of intronic sequence. Although deep intronic variants were not ruled out, we avoided rare genetic variants by only counting junctions detected in two or more control samples. Additionally, the newly detected splicing events in our study were not confirmed with an alternative method. To further improve our protocol, molecular barcodes could be used in order to partially avoid biased amplification of certain splicing events. An additional drawback of this study is the low number of samples used for the catalogue of natural splicing events. Moreover, the catalogue of naturally occurring splicing isofroms was conducted with shortread sequencing, which may cause mapping errors. The catalogue could be improved by confirming the isoforms with long-read sequencing, such as single-molecule real-time or nanopore sequencing. Importantly, with our approach, we are only able to determine splicing events not the whole full-length splicing isoforms, which can be achieved with long-read sequencing.

Conclusions
In conclusion, our novel single-gene assay is a fast, straightforward and cost-efficient technique for discovering splicing impairment. All that a laboratory requires is a set of primers that align to the 5 and 3 UTR region of the gene of interest, long-range PCR amplification, Nextera XT library preparation kit and access to an NGS instrument. RNAseq data can be analyzed on any desktop computer, without the need for high computational power. Low sequencing cost and low computational power for data analysis make the method accessible even to laboratories with limited budgets. The turnaround time starting from RNA isolation to loading a sequencing library onto an NGS instrument is around 10 h (the duration of the experiment depends on the length of cDNA that needs to be amplified with PCR). Our method can be easily adopted in diagnostics laboratories, as the assay can be performed in a time frame necessary for clinical testing. Importantly, when the method is applied in the clinical laboratory, it should always include control samples for excluding normally present splicing events in the diagnostics sample.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/biology10080706/s1, Table S1: naturally occurring splicing junctions of NBN and STK11 genes in the blood of six control samples, obtained with RNAseq and STAR aligner; Table S2: primer sequences and PCR protocol used for long-range amplification of NBN, STK11 and NF1 genes. Benign variants detected in NBN and STK11 genes in six controls with DNAseq. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent was obtained from the patient(s) to publish this paper.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to patients' privacy.