Inherited retinal degenerations (IRDs) represent the most frequent cause of vision loss in people of working age. As a result, these conditions have a highly significant impact on quality of life and health-related costs and loss of income. IRDs are an extremely heterogeneous set of conditions associated with the loss of retinal function, and as a group, represent one of the most genetically diverse hereditary conditions. Over 260 genes to date have been implicated in the syndromic and non-syndromic IRDs [1
], with a wide range of clinical presentations and rates of progression. As this is a diverse set of conditions with frequently overlapping presentations, it is typically divided into large sub-categories, primarily by specific regions or cell types affected, such as rod photoreceptors, cone photoreceptors or, for example, peripheral versus macular regions of the retina. Retinitis Pigmentosa (RP) is the most common form of IRD, is extremely genetically heterogeneous and affects as many as 1 in 3000 individuals [2
]. The disease is typically characterized by progressive loss of rod photoreceptor cells, followed by the gradual death of cone photoreceptors and generally involves characteristic features such as pigmentary deposits in the peripheral retina and attenuation of retinal vessels. In contrast, some other forms of IRD can be extremely rare and have a single gene aetiology; gyrate atrophy, for example, is estimated to affect roughly one in a million people [4
IRDs are currently thought to affect approximately 2.5 million people globally. The vast majority of these individuals have received a diagnosis based on clinical phenotype alone, rather than a genetic diagnosis, if they have been formally diagnosed at all. Clinical trials are in progress for a number of IRDs, however most such trials require patients to have a known causative mutation to participate. Here we present data from Target 5000, an ongoing next generation sequencing (NGS)-based study, which aims to genetically characterise a large national cohort of IRD patients.
The most common method chosen for IRD genetic screening is targeted NGS. Although whole-exome sequencing offers the potential to locate disease-causing mutations in novel genes, in practice diagnosis rates in whole-exome and targeted-sequencing studies are similar [5
], suggesting that the coding regions responsible for the majority of IRDs have been located. Although whole genome analysis has the potential to discover non-coding disease-causing mutations, the difficulty involved with data interpretation and cost associated with the study increase dramatically.
During the course of this study, over 750 individuals from over 520 pedigrees have been sequenced with a targeted NGS panel, focused on exons of 254 IRD-associated genes, in addition to a small number of introns previously reported to harbour splice-altering mutations. Here we present novel mutations primarily from over 200 patients involved in recent recruitment but also resulting from retrospective analysis based on previously recruited patient cohorts [6
]. Candidate mutations were detected in over 68% of our analysed pedigrees. This figure includes previously reported pathogenic mutations and numerous novel likely pathogenic variants. Novel variants include large structural variants, point mutations with high predicted pathogenicity, frameshift mutations and splice site mutations. A single pathogenic or likely pathogenic variant was observed in an additional 8% of pedigrees in which the gene in question is known to cause a recessive retinopathy.
To date, as part of Target 5000, over 15% of the Irish IRD patient population has been sequenced, providing the first national-scale overview of the IRD landscape. The study offers not only a chance to discover new pathogenic variants in known IRD genes, but represents a vital initial step in the genetic characterisation of patients to provide them with information regarding the underlying genetic pathogenesis of their disease. Previously, we have reported the identification of over 40 novel variants in a smaller cohort of IRD patients [14
] and here we describe an additional 23 novel mutations and 3 novel structural variants, totaling nearly 70 novel IRD mutations discovered as part of this study. Several mutations that have been previously reported such as RHO
, p.Met207Arg [28
], have presented in multiple pedigrees in this study. It is likely that some or all of these pedigrees are distantly related and current analysis is ongoing to verify this.
Significantly, the genetic pathogenesis of some previously ambiguous disease phenotypes has also been resolved, most notably the milder, late-onset phenotype of Stargardt disease that is associated with the p.Asn1868Ile ABCA4 mutation. Also, NGS-based genetic diagnoses of IRD patients in this cohort prompted a clinical re-evaluation for many patients, predominantly from simplex RP to BBS, caused by the p.Met390Arg mutation in the BBS1 gene, patients often presenting with subtle additional phenotypes due to, for example, early intervention for polydactyly.
Many challenges still remain for the application of NGS technologies in diagnostic medicine. Ambiguous disease phenotypes and the presence of disease genes that may be associated with multiple IRDs and different modes of inheritance can make achieving a robust diagnosis particularly difficult. The presence of stretches of repetitive sequence in some IRD genes can also make it difficult to confidently call variants in relevant portions of the genome, which we anticipate may mask some disease-causing variants from analysis. For example, an approximately 800 bp region in the centre of RPGR
ORF15 shows a sharp drop in mapped reads due to the repetitive nature of the sequence [34
]. Given that ORF15 has been implicated in cases of X-linked RP in the past [41
], we anticipate that some undiagnosed patients with X-linked retinitis pigmentosa are likely to harbour mutations in this region of the gene, and efforts are underway to augment the protocol we employ to improve coverage of this region; in future sequencing panels, we hope to incorporate an augmented protocol that enhances the success of sequencing in this region. Previous studies have shown that processes can be implemented prior to sample preparation for sequencing [42
] or as a parallel investigation [43
] to aid sequencing this region.
The RP-associated RP1
gene also presents some challenges in relation to genetic diagnostics. The pathogenicity and mode of inheritance of a novel mutation in RP1
is difficult to determine as mutations in this gene have previously been associated with both dominant and recessively inherited disease. This is an issue that has been discussed in a number of other RP1
]. In a recent study, a meta-analysis of previously reported RP1
pathogenic mutations was undertaken to link the impact of each variant to the functional region of the protein. Difficulties remain in identifying new RP1
mutations as dominant- or recessive-acting, however, as regions of the gene predominately associated with dominant RP were found to also harbour mutations associated with recessive rod-cone dystrophies [47
Methods for NGS data analysis are undoubtedly evolving quite rapidly. Research into the effects of splice site mutations and their respective functional impacts is already providing significant insights into the effect(s) of previous potentially overlooked variants in IRD datasets [48
]. Additionally, it is becoming increasingly commonplace for NGS studies of IRD populations to incorporate some form of detection or analysis of copy number variants (CNVs) [51
]. It has been shown in these studies that close to 20% of previously unsolved IRD pedigrees can be resolved with the detection of pathogenic CNVs.
In the current study we describe the bioinformatics methodologies employed to retrospectively analyse datasets to detect CNVs and sequence breakpoints that are present within the captured exonic regions assessed by target capture NGS. Adopting these methodologies, we have identified three IRD pedigrees carrying three separate large structural variants; a heterozygous large deletion in the USH2A gene, a homozygous large deletion in the USH1C gene and a homozygous large inversion in the OAT gene. The structural variants observed using this approach were identified in genes as diverse as the conditions themselves involving gyrate atrophy (OAT), retinitis pigmentosa (USH2A) and Usher syndrome (USH1C). These findings serve to emphasise the importance of implementing analysis systems that enable detection of large scale deletions and inversions in all IRD patients, as currently, we have observed that 100% of these rare SV events correlate with an IRD gene-associated pathogenic phenotype. Although split-read and read-depth analysis of short-read capture data, as performed in our study, is less sensitive to structural variants than similar methods applied to whole genome sequencing (WGS) data, it has the great advantage that the data it requires is already generated as part of standard sequencing pipelines.
It is highly likely that other structural variants may be present in our cohort but remained undetected as their breakpoints lay outside the exonic regions targeted by the capture panel. This was partly solved by the use of read depth analysis instead, as was successfully applied in the case of the USH1C deletion, which had no exonic breakpoints, but this method struggles to detect structural variants that do not span several exons. Despite these limitations, we have demonstrated the utility of this approach for IRD diagnostics by generating clinically actionable results even from past datasets, and we recommend that it be used as a ‘stopgap’ measure to improve diagnosis rates in similar projects before more comprehensive studies can be performed.
Thus far, during the course of the study, genetic analysis of IRD patients has identified candidate mutations in approximately 68% of cases. The diagnostics rates obtained is in line with other NGS studies [4
]. The growing body of data from NGS studies of IRDs similar to this one should facilitate the formation of better correlations between genotype and phenotype. As research in parallel studies such as natural histories of IRDs [56
] and functional analysis of modifier loci [49
] continue, this information in conjunction with NGS data will undoubtedly contribute to improvements in detecting pathogenic genetic variants responsible for IRDs, as well as providing insights regarding prognoses for some IRDs and importantly may also facilitate the future delivery of gene-specific treatments to the applicable patient populations.
Non-coding variants such as splice-affecting variants, either proximal or distal to canonical splice sites, are also likely to represent a significant fraction of the unobserved disease-causing variants [22
]. Previous studies have identified deep intronic variants that lead to intronic sequence being incorrectly retained in the mature mRNA as relevant to IRDs [58
]. These variants are highly likely to be missed by current studies, as very few capture panels target introns, and the interpretation of deep intronic variants is complicated as these regions are less constrained by purifying selection, leading to large numbers of observed variants. Despite the few direct observations, strong indirect evidence of unobserved disease-causing variants in known IRD genes exists. Whole-exome studies have very similar detection rates to studies focused merely on IRD-associated genes [25
], implying that coding mutations in unsequenced genes do not represent a large fraction of unobserved disease-causing mutations. Furthermore, recessive pedigrees that could not be solved in this study with a panel of 254 genes were significantly enriched for single mutations in disease-relevant genes, strongly suggesting the presence of second, as yet unobserved intronic mutations.
Despite these sources of as yet ‘missed’ variation causative of IRDs, the results of this study so far highlight the vast levels of genetic heterogeneity inherent in IRDs in the Irish population and the significant value of a target capture NGS-based genetic evaluation for diagnostic purposes. This has been clearly exemplified by the clinical re-categorisation of the disease pathology for several patients (for example, RP as BBS), the value of detecting pathogenic large structural variants and the continued reanalysis of patient datasets for emerging, previously undetected common pathogenic variants (ABCA4, p.Asn1868Ile) all of which were driven by NGS-based genetic data analysis. Future and ongoing studies, with a particular focus on structural variants and non-coding disease-causing variants, are likely to increase mutation detection rates further and yield an even more complete picture of the genetic architecture of IRDs in Ireland.