Increasing the Genetic Diagnosis Yield in Inherited Retinal Dystrophies: Assigning Pathogenicity to Novel Non-canonical Splice Site Variants.

AIMS
We aimed to validate the pathogenicity of genetic variants identified in inherited retinal dystrophy (IRD) patients, which were located in non-canonical splice sites (NCSS).


METHODS
After next generation sequencing (NGS) analysis (target gene panels or whole exome sequencing (WES)), NCSS variants were prioritized according to in silico predictions. In vivo and in vitro functional tests were used to validate their pathogenicity.


RESULTS
Four novel NCSS variants have been identified. They are located in intron 33 and 34 of ABCA4 (c.4774-9G>A and c.4849-8C>G, respectively), intron 2 of POC1B (c.101-3T>G) and intron 3 of RP2 (c.884-14G>A). Functional analysis detected different aberrant splicing events, including intron retention, exon skipping and intronic nucleotide addition, whose molecular effect was either the disruption or the elongation of the open reading frame of the corresponding gene.


CONCLUSIONS
Our data increase the genetic diagnostic yield of IRD patients and expand the landscape of pathogenic variants, which will have an impact on the genotype-phenotype correlations and allow patients to opt for the emerging gene and cell therapies.


Introduction
Inherited retinal dystrophies (IRDs) constitute a group of clinically and genetically heterogenous Mendelian disorders that lead to irreversible and progressive visual impairment due to dysfunction or loss of photoreceptors. During the past two decades, over 250 IRD causative genes have been described (https://sph.uth.edu/retnet/), most of them using the conventional capillary-based DNA Sanger sequencing methods. Of late, rapid, efficient and cost-effective next-generation sequencing (NGS) has revolutionized the genetic diagnosis field, raising the genetic yield to over 70% when including coding and flanking non-coding regions [1,2]. However, NGS approaches highlight a large number of genetic variants that in the absence of a clear-cut causative mutation have to be prioritized-a non-trivial task. This is particularly evident when the identified genetic variants are located outside canonical splice sites or embedded in deep-intronic regions. In these cases, in silico predictions provide relevant clues that require in vitro or in vivo functional assays to validate their pathogenicity.
To break the current ceiling in the genetic diagnosis of clinically heterogeneous diseases (such as IRDs) and to determine genotype-phenotype correlations, novel protocols for the analysis of copy number variants, and the identification of mutations that either cause aberrant splicing products or impact in the regulation of gene expression, have to be developed and implemented in routine diagnostic assays [3][4][5].
In this context, current estimates indicate that at least 33% of disease-causing mutations alter pre-mRNA splicing [6], which is most probably an underestimation because: (i) NGS analysis (WES and gene panels) are mostly restrained to genetic variants located at the boundaries of canonical splice-site sequences; (ii) some synonymous substitutions, in fact, disrupt splicing regulatory elements such as ESEs/ISE (exonic/intronic splicing enhancers) and ESSs/ISSs (exonic/intronic splicing silencers) [7,8]; and (iii) genetic variants mapping at either non-canonical splice sites (NCSS) or hidden within large introns go mostly undetected.
Our work and that from several groups show that functional validation of variants causing aberrant splicing is attainable in conventional diagnostic centers and amenable to be incorporated in routine diagnostic protocols [3,[9][10][11]. Here we present the functional characterization of new NCSS mutations: two located in intron 33 and 34 of the ABCA4 gene, one located in intron 2 of POC1B gene, and one located in intron 3 of RP2 gene. Our findings expand the universe of pathogenic mutations located in non-coding sequences and highlight the relevance of paying specific attention to NCSS genetic variants. The validation of their phenotypic impact secures genetic diagnosis, with clear benefit for the patient and the clinician, and opens the way to devise personalized gene-therapy strategies based on antisense-oligonucleotides.

Clinical Diagnosis
Patients enrolled in this study were clinically diagnosed with either Retinitis Pigmentosa, Stargardt disease or cone-rod dystrophy on the basis of ophthalmic studies that included visual acuity and visual field tests, fundus ophthalmoscopy, optical coherence tomography (OCT) and electroretinographic (ERG) studies (Table 1).

Samples
After approval from the Bioethics Committee of the Universitat de Barcelona (Institutional Review Board IRB_00003099, 2016), written informed consents for genetic testing were obtained from patients and families, following the recommendations of the American College of Medical Genetics (ACMG) and abiding to the tenets of the Declaration of Helsinki, prior to donation of blood samples. Peripheral blood DNA samples from patients and available relatives were obtained using the QIAamp DNA Blood Maxi Kit (Qiagen, Hilden, Germany). Genomic DNA from probands was analyzed by targeted gene panel sequencing. The targeted gene panel comprised the coding regions of 346 genes and 65 intronic sequences, which include all IRD genes plus genes causing other visual disorders (the complete list of genes is available at www.dbgen.com). Variants identified in genes associated with retinal disorders were carefully selected by the predicted molecular phenotypic effect of the clinical disorder as well as allele frequencies in gnomAD and in our control cohort. Candidate variants were validated by Sanger sequencing and confirmed by cosegregation analysis (Table 2).

In Silico Analysis of the Variant Effects on Splicing
The potential effect of the identified non-canonical splice variants (c.884-14G>A in RP2, c.4774-9G>A and c.4849-8C>G in ABCA4, and c.101-3T>G, in POC1B) on splicing was assessed comparing wild-type and mutant sequences using four different algorithms (i.e., Human SpliceSite Finder, MaxEntScan, NetGene and NNSPLICE) ( Table 3).

In Vivo Splicing Analysis of RP2
After previous confirmation that RP2 is expressed in blood, samples from patient and control were used to analyze the splicing pattern of these genes. Cycloheximide and RNAlater solution (ThermoFisher Scientific, Waltham, MA, USA) were added as described elsewhere [9]. Total RNA was extracted using RiboPure-Blood Kit (Life Technologies, now ThermoFisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions. cDNA strands were synthesized using the qScript cDNA Synthesis Kit (Qiagen, Hilden, Germany). To analyze the effect of the identified variants in splicing events, first-strand cDNA templates were used to amplify the region of interest from exon 2 to 5 of RP2 with specific primers (RP2 Exon2F: 5 -GACAGAAGAGCAGCGATGAAT-3 , RP2 Exon5R: 5 CATATTCCCATCTGTATATCAGC-3 ). PCR reactions were performed as previously described [12], and the products were analyzed by Sanger sequencing.

Results
Several patients affected with retinitis pigmentosa, retinitis pigmentosa with macular affectation, and cone-rod dystrophy that were referred by their clinicians requested genetic diagnosis. Clinical tests including visual acuity and visual field tests, fundus ophthalmoscopy, optical coherence tomography (OCT) and electroretinogram (ERG) studies had been previously performed (Table 1).
Patient DBG1 showed typical RP disease clinical traits, retinal vascular attenuation, EPR with retinal atrophy and peripheral retinal bony spicule hyperpigmentation ( Figure 1A, Table 1). Patient DBG2, who was initially referred as RP with macular affectation, showed early onset photophobia, decreased central vision, peripheral retinal lesions with diffuse chorioretinal atrophy ( Figure 1B, Table 1). Patient DBG3 suffered from night blindness and high myopia since childhood and his fundus examination showed peripapillary atrophy, atrophic Retinal Pigment Epithelium (RPE) at the macula, and vascular attenuation, and was initially clinically diagnosed as RP ( Figure 1C, Table 1). Finally, patient DBG4 complained of visual acuity impairment and photophobia since childhood. Fundoscopy showed pigmentary changes and electroretinogram demonstrated a reduced photopic response. He was clinically diagnosed as being affected of cone-rod dystrophy ( Figure 1D, Table 1). The propositus' 10 year-old-brother had similar phenotypic characteristics with tapetal-like sheen.
We addressed the genetic diagnosis of these patients using target gene panel NGS. In patients DBG1 and DBG2 (initially diagnosed of RP), we detected only one previously reported ABCA4 pathogenic allele in each, namely, c.735T>G p.Tyr245* and c.2894A>G p.Asn965Ser. ABCA4 was a good candidate gene to explain the phenotype presented by the patients, but the second allele was missing. On the other hand, in patients DBG3 and DBG4, no clear pathogenic allele in the coding region of any IRD was identified. A careful examination of NGS data showed that all these patients carried intronic variants mapping close to the intron-exon boundaries (NCSS) in candidate genes (ABCA4, RP2 and POC1B) ( Table 2). Cosegregation analysis confirmed biallelic inheritance of ABCA4 variants in DBG2, biallelic inheritance of POC1B variants in patient DBG4, and hemizygosity of the RP2 variant in patient DBG3 ( Table 2). Concerning DBG1, an affected sister shared the same genotype in ABCA4 than the proband, supporting that this new NCSS variant is the second pathogenic allele. Nonetheless, as the progenitors were not available, we could not confirm biallelism in trans by cosegregation analysis. The very low frequency of these alleles and the absence of homozygotes in normal population databases (GnomAD) supported that these NCSS variants might be causative of aberrant splicing events. In silico prediction analyses using HSF, MaxEntScan, NetGene, and NNSplice programs also confirmed their putative pathogenic molecular effect (Table 3). In all cases, the score value of the reported acceptor or donor splice site was lower in the NCSS variant allele. Furthermore, for variants ABCA4 c.4849-8C>G, ABCA4 c.4774-9G>A, RP2 c.884-14G>A and POC1B c.101-3T>G, a new acceptor splice site was generated that displayed a higher score value, highly indicative of the generation of new splice sites that could interfere with the normal splicing events.  These results prompted us to validate the effect on each of these variants on the splicing pattern of the corresponding gene transcripts either in vivo (RP2, which is expressed in blood) or in vitro (ABCA4 and POC1B minigenes transfected in cultured cell lines), following established methodologies [3,9,11,12].
Since ABCA4 is not expressed in tissues other than the retina, we generated constructs that carried identified variants within their genomic context from patient and control DNA. The corresponding amplified genomic regions were cloned into the pSPL3 backbone, a vector designed for in vitro splicing assays. Cells were transfected with the wild-type and mutant sequences, and cycloheximide was added in order to prevent nonsense-mediated decay of any aberrant transcripts, which could otherwise go undetected [9]. Transcripts produced from each construct in treated and untreated cells were purified and sequenced to compare the effect of the variants. Variant c.4774-9G>A, which potentially introduced a new AS in intron 33 (Figure 2A), clearly produced two aberrant transcripts, one with 7 additional nucleotides from intron 33 (+ 7 nt) due to shifting of the AS, and another with skipping of exon 34 ( Figure 2B, mb1 and mb2, respectively). Both transcripts produce a frameshift and the introduction of premature truncating STOP codons ( Figure 2C). On the other hand, variant c.4849-8C>G, which lowers the value of the polypyrimidine tract of the AS in intron 34, produces transcripts with intron 34 retention, which would also result in premature protein truncation ( Figure 3).
RP2 is an X-linked gene that encodes a ciliary protein that is widely expressed in many cells and tissues, among them blood. In vivo analysis of the RP2 transcripts in a patient's fresh blood sample was performed and compared to a control ( Figure 4A). Variant c.884-14G>A lowers the score value of the intron 3 AS of RP2 and generates a new in-frame AS. Our results support the pathogenicity of this variant since two aberrant transcripts are produced in vivo; one shows the addition of 12 nt between exon 3 and 4, and the other shows exon 4 skipping ( Figure 4B,C). This transcript is out-of-frame and introduces a premature STOP codon.
The effect of POC1B variant was approached in vitro. Variant c.101-3T>G, which alters the AS of intron 2, produced only two aberrant transcripts: one that included two extra nucleotides at the 5 of exon 3, and the other showing skipping of exon 3 ( Figure 5). These two aberrant transcripts are out-of-frame and would cause a premature truncation of the protein.
9G>A, which potentially introduced a new AS in intron 33 (Figure 2A), clearly produced two aberrant transcripts, one with 7 additional nucleotides from intron 33 (+ 7 nt) due to shifting of the AS, and another with skipping of exon 34 ( Figure 2B, mb1 and mb2, respectively). Both transcripts produce a frameshift and the introduction of premature truncating STOP codons ( Figure 2C). On the other hand, variant c.4849-8C>G, which lowers the value of the polypyrimidine tract of the AS in intron 34, produces transcripts with intron 34 retention, which would also result in premature protein truncation (Figure 3).  in the left diagrams) and comparison to the wild-type transcript confirmed the insertion of 7 bp from intron 33 (mb1) due to acceptor splice site shift as well as exon skipping of exon 34 (mb2). RP2 is an X-linked gene that encodes a ciliary protein that is widely expressed in many cells and tissues, among them blood. In vivo analysis of the RP2 transcripts in a patient's fresh blood sample was performed and compared to a control ( Figure 4A). Variant c.884-14G>A lowers the score value of the intron 3 AS of RP2 and generates a new in-frame AS. Our results support the pathogenicity of this variant since two aberrant transcripts are produced in vivo; one shows the addition of 12 nt between exon 3 and 4, and the other shows exon 4 skipping ( Figure 4B,C). This transcript is out-offrame and introduces a premature STOP codon.

Discussion
One of the current challenges in genetic diagnosis is the identification of mutations located in non-coding sequences, e.g. mutations in regulatory and intronic regions [13]. WES and target gene panels, which are the tools of choice for routine genetic diagnoses, mainly capture exonic and exonintron boundary sequences, and also the filtering bioinformatics algorithms used for analysis mostly focus on variants that alter the coding-sequence or the consensus DS and AS splicing sites [14,15]. Without the addition of deep-intronic and NCSS variants, only around 50% of the cases in IRDs are diagnosed conclusively. The rest of the cases remain unsolved, with either the identification of a single pathogenic allele or even none. A step further in genetic diagnosis has been attained when deep-intronic variants that alter the splicing pattern have been identified, as is the case in Stargardt disease, a macular disorder with a very well defined clinical phenotype and a major causative gene, ABCA4. In fact, the introduction of the whole ABCA4 locus in the target NGS panels clearly helps to increase the genetic yield in Stargardt disease patients [16,17].
Potential pathogenic nucleotide variants that are overlooked in the WES and target gene sequencing algorithms for variant prioritization are located in non-canonical splice sites (NCSS). These variants do not directly affect the primary sequence of the consensus donor and acceptor sites, but they can instead disrupt or alter the splicing motif recognition by the spliceosome and adjuvant factors, for instance, by shifting the percentage of pyrimidines in the polypyrimidine tract flanking the AS or by disrupting conserved Exon Splicing Enhancers (ESEs). Not all variants mapping at close locations of consensus splice sites will be have an effect on splicing. In silico predictions provide score

Discussion
One of the current challenges in genetic diagnosis is the identification of mutations located in non-coding sequences, e.g. mutations in regulatory and intronic regions [13]. WES and target gene panels, which are the tools of choice for routine genetic diagnoses, mainly capture exonic and exon-intron boundary sequences, and also the filtering bioinformatics algorithms used for analysis mostly focus on variants that alter the coding-sequence or the consensus DS and AS splicing sites [14,15]. Without the addition of deep-intronic and NCSS variants, only around 50% of the cases in IRDs are diagnosed conclusively. The rest of the cases remain unsolved, with either the identification of a single pathogenic allele or even none. A step further in genetic diagnosis has been attained when deep-intronic variants that alter the splicing pattern have been identified, as is the case in Stargardt disease, a macular disorder with a very well defined clinical phenotype and a major causative gene, ABCA4. In fact, the introduction of the whole ABCA4 locus in the target NGS panels clearly helps to increase the genetic yield in Stargardt disease patients [16,17].
Potential pathogenic nucleotide variants that are overlooked in the WES and target gene sequencing algorithms for variant prioritization are located in non-canonical splice sites (NCSS). These variants do not directly affect the primary sequence of the consensus donor and acceptor sites, but they can instead disrupt or alter the splicing motif recognition by the spliceosome and adjuvant factors, for instance, by shifting the percentage of pyrimidines in the polypyrimidine tract flanking the AS or by disrupting conserved Exon Splicing Enhancers (ESEs). Not all variants mapping at close locations of consensus splice sites will be have an effect on splicing. In silico predictions provide score values to splice-site motifs in the sequence carrying the identified NCSS variant and allow the comparison to the wild-type sequence. Alterations in these score values are a first clue, but indeed, functional in vitro and in vivo assays should be performed to validate the impact of these novel variants in the splicing of transcripts before they could be assigned as pathogenic.
Confirmation of the splicing altering effect of the identified variants can be directly performed in patients by transcript analysis in blood, saliva, hair or biopsies, when the gene is expressed in these tissues [12,18], as happens with genes that encode ciliary proteins, such as RP2 (Figure 3). However, many IRD genes are expressed only in the retina (e.g., ABCA4) or show a tissue-specific splicing pattern (e.g., RPGR) [19]. The construction of midigenes or minigenes spanning the genomic context for in vitro expression in cell cultures allows to assay the impact of the identified variant in the splicing of transcripts and thus validate pathogenicity [3]. Tissue-specific spliceosome factors might be required to observe the pathogenic effect of a particular variant. In these cases, the differentiation of patient's iPSCs into retinal organoids mimics a physiological retinal-like setting that has been used to test mutations altering retinal-restricted splicing events [20]. Indeed, retinal organoids are very informative tools for variant validation, but this approach is costly and available to very few laboratories.
All the variants identified in this work alter the NCSS of the AS. According to our results, in one of the ABCA4 alleles, the variant perturbs the polypyrimidine tract and as a consequence, the intron is retained in most splicing events. The other three identified alleles (in RP2, POC1B and one in ABCA4) generate a new consensus AS, sometimes with a stronger score value than the wild-type AS. In these cases, two different splicing effects are observed: exon skipping and the production of an aberrant transcript that adds several nucleotides to the exon, usually out-of frame. These two alterations, elongation of the upstream sequence of an exon and exon skipping, have been reported as expectable outcomes for NCSS variants [3,9,12,21]. It is difficult to assess whether any wild-type spliced transcripts are still produced in the patient's retina, although NCSS mutations should not be expected to be as severe as mutations disrupting the consensus splice motifs. In this context, most NCSS variants might be considered as hypomorphic alleles.
Several authors have proposed that some "missing heritability" in IRDs is due to deep-intronic and NCSS mutations [2,22,23]. At least for IRDs, ABCA4 is the paradigm gene for intronic mutations that alter splicing. Most of the variants are novel, indicating that many mutations in this gene are private. Our results further support that some of these "hidden" IRD alleles are located in NCSS, expand the NCSS list of mutations in ABCA4 and also bring two new genes, RP2 and POC1B, to the fore. We surmise that as more data is being gathered and analyzed, more deep-intronic and non-canonical splice site mutations will come to light [9], not only in IRD genes but also in other Mendelian diseases. Considering that among the most recently reported successful gene therapies for visual disorders, antisense oligonucleotides (AONs) are particularly promising to modulate the effect of splicing mutations [20][21][22][23][24][25], their characterization becomes crucial for patients carrying these variants in order to access these emerging therapies.

Conclusions
The identification of four novels NCSS variants in ABCA4, RP2, and POC1B highlights the relevance of pathogenic hidden variants that alter splicing to increase the genetic diagnostic yield of IRDs. These findings contribute to define the genotype-phenotype correlations and help patients and clinicians to make the best decision in front of the emerging gene and cell therapies. Funding: This activity was sponsored by DBGen Ocular Genomics and by grants SAF2016-80937-R (Ministerio de Economía y Competitividad/FEDER) and 2017SGR-0738 (Generalitat de Catalunya) to G.M. and R.G.D.