Next-Generation Sequencing Applications for Inherited Retinal Diseases

Inherited retinal diseases (IRDs) represent a collection of phenotypically and genetically diverse conditions. IRDs phenotype(s) can be isolated to the eye or can involve multiple tissues. These conditions are associated with diverse forms of inheritance, and variants within the same gene often can be associated with multiple distinct phenotypes. Such aspects of the IRDs highlight the difficulty met when establishing a genetic diagnosis in patients. Here we provide an overview of cutting-edge next-generation sequencing techniques and strategies currently in use to maximise the effectivity of IRD gene screening. These techniques have helped researchers globally to find elusive causes of IRDs, including copy number variants, structural variants, new IRD genes and deep intronic variants, among others. Resolving a genetic diagnosis with thorough testing enables a more accurate diagnosis and more informed prognosis and should also provide information on inheritance patterns which may be of particular interest to patients of a child-bearing age. Given that IRDs are heritable conditions, genetic counselling may be offered to help inform family planning, carrier testing and prenatal screening. Additionally, a verified genetic diagnosis may enable access to appropriate clinical trials or approved medications that may be available for the condition.


Introduction
A primary focus in ocular genetics globally is accurate genotyping of patients with rare inherited retinal diseases (IRDs). Next-generation sequencing (NGS) has been a common strategy employed in many countries to achieve this goal for several years [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16]. This review focuses on the various methods and strategies that are being implemented to elucidate the genetic pathogenesis of IRDs and provides an overview of how these approaches have evolved. IRDs have an estimated prevalence of 1 in 4000 [17]. With a current global population of approximately 7.8 billion [18], it is estimated that approximately 2 million people currently have some form of IRD.
As a global community involved in ocular genetics, the common goal is to achieve a diagnostic success rate of 100% for all IRD patients enrolled in clinical studies. This objective, however, presents several challenges. Firstly, over 270 genes have been associated with the aetiologies of IRDs (RetNet, Retinal Information Network, https://sph.uth.edu/retnet/ accessed on 20 April 2021) [19]. Furthermore, extensive diversity in clinical presentation due to mutations even within a single IRD gene, as well as intersecting clinical phenotypes and phenocopies, is encountered. Mutations in disease genes may affect the retina in isolation, or may have more systemic effects. For example, there are 80 systemic conditions with a retinal phenotype and 200 genes that not only affect retinal health but also the central nervous system, kidneys or heart [20]. Such complexity makes it near-impossible for a diagnosis to be achieved in most instances solely on the basis of disease phenotype [2,[21][22][23]. Furthermore, even a single pathogenic variant can manifest with phenotypic variability [24]. For some IRDs, modifier loci have been identified, somewhat blurring the borders between Mendelian and polygenic forms of IRD and mirroring similar observations with other disease aetiologies [25,26].
In this review, we aim to provide an overview of the NGS strategies employed globally to maximise the detection of IRD-causing mutations. This includes the use of targeted gene panels for all IRD phenotypes or phenotypic subsets; whole-exome sequencing (WES); whole-gene sequencing, whereby an IRD gene's 5 and 3 sequences, exons and introns are interrogated; whole-genome sequencing (WGS); and bespoke methods to compliment other strategies, such as structural variant (SV) detection and copy number variant detection (CNV), among many others. In parallel with the use of the above technologies for the identification of candidate IRD-causing sequence variants, a wide array of methods to explore the functional effects of sequence variants have also been developed, and these are discussed.
While NGS technologies have enabled rapid characterisation of the genetic architecture of IRDs in many disease cohorts with diagnostic rates often approximating 70% [27], much still remains to be optimised. Strategies currently under development to further improve diagnostic rates are reviewed herein, as are the approaches being employed to enable interpretation of novel coding and non-coding candidate variants. Additionally, given the availability of increased numbers of WGS sequences from IRD and control populations, a greater focus is placed on the elucidation of the genetic modifier loci that may influence the effect(s) of the primary disease-causing mutations. An overview of the findings to date is provided.

IRDs-Target Panels and Whole Exome Studies
To elucidate the genetic contributions to IRDs, DNA, typically isolated from saliva or peripheral blood sample, is analysed. Optimal processing of the sample will depend on which form NGS is to be employed. In terms of both cost and data generated, NGS methods ranging from low to high involve targeted sequencing (TS), whole-exome sequencing (WES) or whole-genome sequencing (WGS). WES exclusively captures the protein-coding exons, but only accounts for approximately 1% of the genome. It is important to note that exonbased sequencing is likely to also reliably detect intronic variants located close to the targeted exons, such as non-canonical splice site variants, which are known causes of IRDs [28][29][30]. WGS is significantly more comprehensive including introns, promoters, and intergenic regions; in principle sequencing every nucleotide possible in a sample. TS typically captures the smallest amount of genetic information but does so in a completely customisable manner. For example, some IRD phenotypes are associated with pathogenic variants in a very small number of genes, but some of those genes may also be known to harbour pathogenic deep-intronic variants. In this case, adopting a TS approach would be more fruitful than WES. Arguably, WGS could also be used for this purpose but would generate more off-target data requiring significantly greater levels of analysis and storage and the on-target data would likely be less than a TS approach.
The benefits of TS are that it is an economical method of focusing sequencing capacity in smaller genomic regions including noncoding regions, therefore maximizing the coverage of clinically relevant genes. Enhanced coverage translates to greater sequencing read depth which is valuable, for example, to increase the resolution of detecting genetic variants and to detect smaller levels of heteroplasmy in the mitochondrial genome, or mosaicism in the nuclear genome [31][32][33][34]. By reducing the size of the region of genome sequenced per sample, a greater number of samples can be multiplexed together and processed in the same sequencing run. There are other cost-savings elements to TS, smaller file sizes allow for cheaper storage and faster processing. Moreover, targeting specific regions of the genome previously implicated in IRDs, can massively reduce the risk of detecting secondary or incidental findings.
For these reasons, TS strategies have frequently been employed for IRD screening for many years. Shortcomings of a TS strategy are that it often involves multiple gene panels for different conditions and if new IRD genes are identified or new variant associations are made for genes outside of the panel, a panel redesign will be required to include them. It is possible to use TS to detect indicators of large structural variants; however, such genomic breakpoints would likely have to occur within the captured loci reducing the likelihood of identification [1].
The customisable element of TS has become increasingly valuable with the recent detection of several population-enriched rare pathogenic variants likely due to founder effects. For example, a novel PDE6B variant was observed in the Māori IRD participant group and is likely to account for 16% of all recessive IRDs in that population [35]. Similarly, EYS gene variants were found to be causative in 51% of a RP cohort from Japan [36]. This discovery is not unique, as several other parallel studies have revealed similar founder mutations in their target populations, for example, Belgium, RAX2 [37]; Costa Rica, RPE65 [38]; Finland, CERKL [39]; Japan, EYS [36]; Spain, RP1 [40] and ABCA4 [41]; Jewish community in Caucasia, PDE6B [42]; Pakistan, ABCA4 and NMNAT1 [43]; Guyana, BBS9 [44]; and Faroe Islands, MERTK [45]. The enrichment of these variants, several of which are large structural variants, emphasises the value of population-specific TS panels to target and detect mutations and mutational breakpoints that may be missed by commercial generic gene panel sequencing or even WES.
The use of WES has increased in popularity in recent years (Table 1) compared to previous metadata reported [27] and has many advantages over the TS approach. An effective TS panel design can be optimised with prior knowledge of the spectrum of mutations capable of causing the patient's condition. This includes but is not limited to, knowledge of possible founder mutations in the population, all possible genotype-phenotype associations and breakpoint locations of any large structural variants that may exist. WES is agnostic to these issues. Although WES is not capable of detecting deep intronic mutations without modifying the method, it enables exonic variants to be detected even if their relevance is not entirely elucidated at the time of capture. This provides the potential for future interrogation of WES data as new IRD genes are discovered. Importantly, WES allows for the potential future resolution of a previously unsolved diagnosis. Table 1. Screening studies of inherited retinal disease (IRD) populations. CRD = cone-rod dystrophy; LCA = Lebers congenital amaurosis; IRD = inherited retinal dystrophy; MD = macular dystrophy; RP = retinitis pigmentosa; TC = target capture; WES = whole-exome sequencing; WGS = whole-genome sequencing.  Further to this point, many disease phenotype-based gene panels are very specific and therefore typically target only a small number of genes, not allowing for the possibility of new genotype-phenotype correlations or ambiguous phenotypes. Indeed, it was recently established that 23% of cases analysed would not have been resolved if they were sequenced by a commercial panel designed specifically for a patient's phenotype [81]. In the same study, it was also found that for 26% of participants, the cheapest applicable commercial gene panel would have been costlier than performing WES for those patients.
Several large IRD screening studies in recent years have sought to identify the genes responsible for the largest proportions of their cohorts' IRDs. In the UK, over 3000 pedigrees were reviewed, and it was determined that 135 IRD genes contributed to the genetic pathogenesis of the cohort. Interestingly, 70% of resolved cases were deemed to have causative mutations in 20 genes only [82]. Similarly, in over 5000 pedigrees with genetic eye conditions from Canada and the US, 68% of pathogenic or likely pathogenic mutations were identified in just 10 genes [78]. Both of these large studies identified ABCA4, USH2A and RPGR as the top three genes contributing to IRDs.
Although clearly not 100% effective, smaller whole-gene panels may be very effective as a first-tier screening approach. Several gene associations have been recently flagged as unlikely to be as pathogenic as initially reported. Nineteen percent of queried autosomal dominant retinitis pigmentosa (adRP) genes were deemed to harbour variants unlikely to be disease-causing for reasons relating to their respective allele frequencies or variant interpretation at that time [83]. Such variant "false positives" are shortcomings of the diagnostic odyssey, and this implies that, for an initial screening procedure, there may not be the need to screen as many of the genes and variants that are typically included in large gene panel screening studies.
It is important to also note that there have been many reports of the occurrence of multiple IRDs within the same family, or even within a single individual. Although individually IRDs are rare globally, the concurrence of multiple IRDs in a patient or pedigree unfortunately represents another diagnostic challenge. Our team has previously reported a pedigree in which five affected members of the family were broadly categorised as RP phenotypes. After genetic investigation it was revealed that four of these individuals were homozygous for a FLVCR1 variant, while the remaining affected patient was compound heterozygous for pathogenic variants in NR2E3 [84]. Similarly, in a US study, involving three IRD pedigrees, each given an initial diagnosis of RP, one with a dominant RP and the other two with a dominant, incompletely penetrant RP, it was found that multiple IRD genes were responsible for various affected individuals in each of the three families: both USH2A and RP1 was segregating in one family; PRPH2 and CRX in a second family and PRPH2, PRPH8 and USH2A in the third family [85]. These studies, however, are trumped in complexity by Birtel et al.'s analysis of a single family with four different IRDs each caused by distinct pathogenic variants and inheritance patterns: father, RHO, dominant RP; mother, ABCA4 and CACNA1F, recessive Stargardt and CSNB; first son, CACNA1F, CSNB; second son, MITF, dominant Waardenburg syndrome [86]. These are some examples of the many that exist, illustrating the complexity of IRD screening and reinforcing the necessity of thorough clinical and genetic investigation prior to genetic counselling [87].

Expanding IRD Diagnosis via Whole-Gene or WGS
For any laboratory electing to use NGS, it is essential that the limitations of the NGS approach to be employed are known. Failure to appropriately sequence the target genomic sites clearly will limit the success rate from the outset. Three consistent biases that exist for WES but not WGS are strand bias, evenness of coverage and the proportion of transcripts covered in their entirety. Interestingly, it has also been found that WGS provides a 3% better coverage of exons compared to WES, 98% versus 95% [88]. In another study it was found that the WGS approach offered superior detection of structural variants, variants in regulatory regions and detection of variants in GC-rich regions compared to WES [74]. However, this additional superior detection comes at a significant financial cost. A review of studies that used WES and WGS for clinical practice revealed that the price range for WGS studies was approximately five times higher for WGS on a per sample basis [89].
Costs incurred by WGS include not only the upfront cost of sequencing, but also additional downstream expenses. WGS produces vastly more data, thus immediately requiring additional computational power, people hours and storage to process. Although storage issues may be a limiting factor in the budgets of most research groups, policies regarding data storage can be readily adjusted to meet the needs of the research group, as the needs of two research groups will rarely be the same. This bespoke approach is advisable to avoid issues such as inadequate infrastructure and overspending. Raw sequencing files (such as .fastq files) and output files (variant call files, .vcf) are relatively small in comparison to the alignment files (such as .bam files) that need to be produced as part of the analyses [90][91][92]. Given this information, it is possible to reduce the capacity required for long term storage of sequencing data by electing to discard the alignment files but keeping the input and output files so that the analysis can be repeated at a later date and outputs can be compared for discrepancies and newer discoveries. However, increased storage will be required again upon reanalysis, as new alignment files will be created as part of the process.
Another viable alternative in reducing the disk footprint of alignment files, is the use of additionally compressed formats, such as .cram files. This format uses reference-based compression, only storing base calls that differ to the reference genome used. This compression can be either lossless or can incur a reduction in base quality scores corresponding to the level of compression. Even the lossless format offers a 40-50% reduction in space required by comparison with BAM [93]. In addition, the use of WGS as a second-tier approach, for cases that remain genetically unresolved following first-tier sequencing will decrease data-storage demands. More research groups are now making the move to cloud-based storage for their NGS data and minimising the amount of data stored has a direct impact on cost [94]. It is important to note that sample processing and analyses are available via cloud-based solutions also, and may be an attractive option for research groups lacking the necessary in-house infrastructure to process NGS data [95].
The additional cost of larger-scale analysis is not the only hazard associated with this data management. Both WES and WGS have an increased likelihood of carrying intrinsic responsibilities regarding the management of incidental or secondary findings (SFs) unrelated to the initial indication for sequencing. For example, Hart et al. (2018) found that a SF is detected in 1.7% of patients who undergo WES [96]. Some IRD studies have employed a nested targeted approach, wherein the entire genome of an individual is sequenced but only variants in genes relevant to the IRD phenotype are interrogated by use of variant filtering with a virtual gene panel. This still provides benefits over traditional targeted panels, as it also includes sequencing of non-coding regions, as well as the potential for analysis of an expanded panel in the future. For example, Carss et al. (2017) performed WES on 117 individuals, identifying pathogenic variants in 59 cases [74]. Forty-five of the unresolved cases then underwent WGS and positive candidate variants were identified in an additional 14 cases. This approach is likely chosen due to the immense volume of data produced by WGS and WES, and the need to more rapidly analyse the most relevant data available.
This approach also limits the possibility of detecting SFs. In SF v1.0, The American College of Medical Genetics and Genomics (ACMG) recommended analysis of 56 medically actionable gene-phenotype pairs which was then updated to a panel of 59 genes in v2.0 [97,98]. ACMG SF v3.0, recommending the analysis of SFs in 73 gene-phenotype pairs, was very recently released [99,100]. Of particular interest to the ocular genetics community, ACMG SF v3.0 now includes the RPE65 gene. The RPE65 gene was included on the basis that an FDA-approved gene therapy now exists for biallelic RPE65-retinopathies and that patients may derive additional benefit from earlier detection and therapeutic intervention. The ACMG recommends application of these SF guidelines in a clinical setting as opposed to a research setting. Nonetheless, as with all genomic testing, it is imperative that the patient's interests are at the forefront. ACMG currently recommend that patients/guardians have the choice to opt-out of SF testing. This highlights the necessity of an appropriate and comprehensive pre-testing consent procedure. This includes and is not limited to information pertaining to what will not be disclosed should the patient/guardian chose to abstain from SF analysis and thorough pre-test and post-test counselling.
Despite additional costs, there may be diagnostic benefits to employing WGS to resolve genotypes. Lionel et al. investigated 103 cases of diverse genetic disorders comparing WGS to targeted panel sequencing. Not only was the solve rate superior when WGS was used, 41% versus 24%, 18 diagnoses were made based on structural variants or intronic variants that were not captured by the TS method [101]. Regardless of the substantial number of genes identified and targeted by TS, as estimated from studies to date, the genetic cause of 43% of all IRDs patients remains unknown and suggests the need for more studies to employ WGS (Table 1). These missing genetic aberrations may reside in introns or intergenic regions, both of which are captured by WGS. There is also the possibly of novel IRD gene discovery that is facilitated by WGS. The superior uniformity of genome coverage enabled by WGS also allows for greater sensitivity when detecting copy number variants (CNVs) that are notoriously difficult to detect by TS and WES.
A cost-effective alternative that retains many of the same benefits as WGS is wholegene sequencing (GS). GS enables the capture of exonic, intronic and 5 and 3 regulatory regions for a target gene of interest but has many of the same limitations as the TS approach, including strand bias and GC-rich impedance to capture. GS has been utilised very successfully for cohorts with phenotypes associated with monogenic or near-monogenic causes. For example, individuals affected with incomplete congenital stationary night blindness (icCSNB) present with a recognisable phenotype. This form of icCSNB is primarily associated with mutations in the CACNA1F gene. In a recent large genotyping study of icCSNB-CACNA1F patients (n = 189), 4% of CACNA1F causative variants were attributed to intronic and synonymous mutations [102]. It is also probable that there are additional intronic variants yet to be designated as pathogenic in the unresolved portion of this cohort.
Similarly, Khan et al. investigated 1054 unresolved Stargardt cases with a GS approach. Stargardt disease is predominately caused by biallelic variants in the ABCA4 gene. The authors of the study used a single-molecule molecular inversion probes (smMIPs) approach, which proved reliable and cost-effective. Their study revealed the presence of pathogenic SVs and deep-intronic variants in 25% of biallelic cases [103]. The smMIPs method is gaining in popularity given its superior target capture and low cost compared to other TS capture methods. In a recent comparative study, 176 IRD patients were analysed with both smMIPs and TS. The smMIPs approach demonstrated enhanced target coverage (97.3% versus 93.9%) and was five times more cost effective when greater than 500 samples were analysed [104].
The GS approach has also been combined with probes for other IRD genes to investigate if this combinatorial strategy could significantly improve diagnostic rates for a range of IRDs when compared to traditional exon-based TS IRD panels. The study design encompassed a second-tier approach for patients who had one previous variant found in USH2A, ABCA4 and CEP290. These whole genes, plus exons of 76 additional IRD genes and pathogenic intronic regions of two IRD genes were sequenced in an effort to resolve the "one-hit" patients. An overall diagnostic rate of 58.6% was achieved; two copy number variants were detected in USH2A [105]. Although this diagnostic rate was no higher than the average study (Table 1), it does represent significant improvements that can be made to address the large proportion of unresolved patients identified by standard screening studies. The structural variants established in this study would likely not have been detected by use of a more traditional, purely exon-targeting design.
An improved GS study design as outlined above may have additional advantages. The RPGR gene is one example of an IRD gene that includes regions that are challenging to sequence comprehensively with traditional TS or WES; sequencing through ORF15 is impeded due to a low-complexity sequence composition [106]. However, it is vital to capture this gene as, for example, it accounts for nearly 40% of X-linked retinopathies in the UK. This makes pathogenic variants in this gene the third most-prevalent cause of IRDs in this population [82]. In an Italian cohort of 48 RPGR-related RP cases, approximately half had mutations in ORF15 and presented with a more severe phenotype than the other causative variants in exons 1-14 of RPGR [107]. It has been suggested that the sequence coverage of ORF15 could be optimised by modifying NGS library preparation, reducing false-negatives, miscalled variants and false-positives when compared to traditional methods [108]. For this reason, many recent NGS screening studies have adopted bespoke approaches to sequencing RPGR, including entirely separate analyses or spiking the NGS libraries with separately generated amplicons for RPGR [50,53,56]. In another study improved alignment of sequencing reads mapping to the ORF15 region by using a de novo assembly approach were reported. The accuracy of sequencing can be quickly determined for males when analysing the RPGR gene, as variants called in error will likely not be represented in every sequencing read mapping to this region. Therefore, heterozygous variant calls can be readily identified as errors, since males have only one copy of RPGR. This de novo assembly approach reduced the number of false-heterozygous calls in males and improved the accuracy of indel calls [109].
Another example of genes that benefit from tailored GS design are those encoding the opsins, OPN1LW (red cone cells) and OPN1MW (green cone cells). These genes encode photopigments in the retina and pose an interesting challenge to sequencing efforts. These two genes are 96% homologous which introduces unique challenges for the IRD gene panel, as short-read sequencing may be unable to determine the best alignment option when mapping back to the genome [110]. A new two-step method from Atilano et al. has demonstrated that long-range PCR can generate specific long amplicons that can be more readily mapped back to the genome [111]. This approach offers a solution that can be analysed separately, by direct sequencing of the amplicons, or alternatively, as part of a larger panel if long-read sequencing (LRseq) is used.
Sequencing the entirety of a gene also facilitates the detection of variants in the upstream regulatory regions which have been implicated in retinal disease previously, for example in Blue-Cone Monochromacy (BCM) [112]. In this condition, a c. −71A>C promoter mutation was initially thought to decrease expression and cause a deutan colour vision deficiency. However, after functional analyses, the mutation was revealed to result in more than double the wild-type expression level of the gene [113]. Other deletions in this area have also been shown to result in BCM phenotypes, suggesting that this gene is sensitive to alterations in both under and overexpression [114,115]. Similarly, Radziwon et al. used luciferase reporter assays to assess upstream variants detected in the CHM gene in patients with choroideremia. Both probands had variants at position c. −98: C>A and C>T. Both mutations led to a reduction in luciferase activity and furthermore, the promoter region for CHM was defined as the region encompassing nucleotides c. −119 to c. −76 [116].
Regulatory mutations are often difficult to interpret, particularly for genes associated with recessive forms of inheritance. Previously, consanguineous pedigrees have been useful for identifying homozygous variants in these cases, such as NMNAT-related Leber congenital amaurosis (LCA) [117]. Variant interpretation can be further complicated as such variants may not have strong effects on gene expression. In a recent study of promoter variants in ELOVL4 two variants were found, c. −236 C>T (rs240307) and c. −90 G>C (rs62407622) which resulted in 18% and 14% reduction in expressivity, respectively. However, as the patient in question had the variants in trans, a severe phenotype was observed, much more than would have been expected from the modest effect of the two variants analysed separately [118]. This detrimental synergistic effect may emphasise the threshold sensitivity of retinal tissues and cell components to the dosage levels of this protein and its downstream effects.

Copy Number and Structural Variants
As discussed above, TS and WES methods are the most universally utilised, yet they are largely incapable of detecting large copy number variants (CNVs), structural variants (SVs) and chromosomal rearrangements. In 2018, an extensive literature-mining endeavour revealed that 1345 copy-number variants (CNVs)-specifically, 317 unique variants-had been reported in 81 distinct IRD genes. When further analysed, the size of the gene correlated with the reported numbers of CNVs associated with that gene. Additionally, many of these large variants affected non-coding and potential cis-regulatory elements [119]. The relevance of such variant types is now recognised, and guidelines have been published to assist in the interpretation and classification of them, similarly to those published in recent years for single-nucleotide variants [120,121].
CNVs and SVs can also vary significantly in the complexity of their rearrangements. Gross deletions have previously been detected in many genes, including BEST1, EYS, MERTK, USH2A and many more from the aforementioned study alone [119]. Large deletions have also been reported in RPGR [122], CHM [58], OPN1LW/OPN1MW [123] and USH1C [1], to name but a few. Deletions are likely to be the most detectable CNV type given that most studies employ WES or TS to detect mutations. Homozygous deletions are the most readily detectable from using these methods as the read coverage over the deleted region would be zero, given no template exists for capture or amplification. Heterozygous deletions may be under-reported when using WES or TS, if significant amplification has occurred, which may unintentionally normalise the ultimate read depth aligned to the deleted region. For similar reasons, duplications can be very difficult to detect with these methods. However, such mutations can be more readily detected by WGS due to the superior and more even coverage, or by more specific approaches, such as targeted locus amplification [119]. Some regions of the genome, such as the RP17 locus, have been shown to harbour many complex CNV and SV variants associated with IRDs. These convoluted rearrangements resulted in the interference of the surrounding genome architecture, disrupting enhancer-promoter interactions, and resulting in aberrant gene expression [124].
Genomic rearrangements are more likely to be detected by the presence of broken sequencing reads when aligned back to the reference genome. This occurs when a read, or pair of reads, partially align to one part of the genome and partially to somewhere quite different. This is applicable to translocations and inversions, as unlike CNVs, the read depth is not expected to be altered in these scenarios. Given the significant presence of retrotransposon sequence in the human genome, it is not surprising that several retrotransposons insertions have been reported to disrupt the functionality of IRD genes [125][126][127]. The BBS1 gene in particular, has been recently reported to harbour retrotransposons causative of disease [128,129]. Retrotransposons, much like other large genomic insertions, can be difficult to detect as they are unlikely to disrupt read depth in the genomic region to which they have relocated to, since alignment tools will align these reads to their original positions in the genome. Broken reads may indicate that a rearrangement has occurred. If breakpoints are detected, the region can be directly sequenced to shed light on the nature of the SV. Alternatively, a de novo assembly approach may be used to reconstruct the queried genome [130]. This approach will likely only be beneficial in the case of WGS, since TS or WES will likely not have sequenced the insert because the original genomic region was not an intended target.
In one study, involving an investigation of PRPF31-related disease, 45% of probands (10 of 22) tested positive for a CNV. The PRPF31 gene has no obvious sequence elements that may make it particularly susceptible to genomic rearrangement, such as long interspersed nuclear elements (LINE) and long terminal repeat (LTR) elements [131]. The study emphasises the importance of integrating CNV and SV detection into screening protocols, even for genes that may not appear to be conventionally susceptible to genomic rearrangements. The estimated prevalence of causative SVs in IRD cases is roughly 10% [51,[131][132][133]. This is similar to findings from other rare disease cohorts as 12% of developmental disorders are estimated to be caused by pathogenic CNVs, therefore CNV and SV detection is recommended to be incorporated into first-tier testing for that set of conditions [134]. In a large hearing loss screening study of over 1000 patients, 18% of resolved patients were found to have causative CNVs [135]. CNV detection has also proven very useful in diagnosing atypical syndromic IRD cases resulting in novel genotype-phenotype associations and the refinement of complex phenotypes in multiple cases [136].
Many of the NGS methods discussed so far have revolved around short-read sequencing; however, long-read sequencing is arguably the superior approach for detection of SVs and CNVs. Short-read sequencing is generally preferred to ensure that high-quality data are produced [137]. However, this technology is greatly hindered by features of the genome, such as repetitive elements, which are not only abundant in our genomes, but also known to increase the likelihood of an SV event occurring in IRD genes [123,138]. Long-read sequencing offers superior sequencing of such regions and offers a chance to more accurately recapitulate patients' true genomic sequences through the use of de novo assembly [139][140][141]. Results from several studies to date have revealed IRD-causing SVs by the use of long-read sequencing, and in some cases, concluded that the complexity of the SV was such that it was likely not possible to fully resolve it by short-read sequencing [44,142].
Another useful application of long-read sequencing is determining the phase of potentially causative recessive variants. Determining the phase of variants is of critical importance, as it may determine whether causative variants have been established, if in trans, or not, if in cis. This task is challenging for IRD cases primarily for two reasons. Firstly, variants causative of Mendelian IRDs are extremely rare in most cases. This prevents the establishment of known haplotypes or complex alleles in most cases that may otherwise indicate that the two detected variants are likely in cis. Secondly, many IRD screening endeavours are still in their infancy. This results in the widest possible age range of patients, since even patients with paediatric onset of their condition, may be elderly when screened. This can often make segregation analysis difficult, as many of their close family members may be immobile or deceased. Long-read sequencing offers the interpreter a greater chance of capturing both variants of interest within the same sequencing molecule and therefore determining phase of the variants without the need for additional family members [140].

Modifiers of IRD Phenotypes
There are multiple examples of phenotypes manifesting significantly differently in IRD patients, even when harbouring the same mutation [24]. Such variability may be in overall severity, age of onset or pattern of degeneration. While some instances may be partially explained by variant haplotypes or known interactions with other genes, much of the source of variation remains unknown [143]. It has also yet to be established how much of this unknown contribution is genetic.
Pathogenic variants in PRPF31 are the second most common cause of autosomal dominant retinitis pigmentosa [144]. PRPF31 is a universally expressed splicing-factor and has a vital role in the processing of pre-mRNA. However, reduced expression of this ubiquitous splicing-factor results in isolated retinal restricted disease, retinitis pigmentosa. It has been shown in multiple studies that reduced levels of PRPF31 in the retina results in the mis-splicing and reduced expression of several key IRD genes associated with phototransduction and RNA processing [145,146]. This indirect mechanism of action may partially explain the incomplete penetrance and variable severity frequently reported with this IRD gene [144]. It has also been shown that minisatellite repeat elements (MSR1) proximal to the promoter of PRPF31 can modify the expression of the gene. Alleles with four copies of the MSR1 were shown to correlate with asymptomatic individuals, while alleles with three copies of MSR1 greatly reduced the expression of PRPF31 [147].
Another key factor in the manifestation of an IRD relates to the naturally occurring expression levels of the implicated genes. In a study by Green et al., IRD genes typically associated with incomplete inheritance significantly correlated with greater variability in levels of expression in a healthy population in several tissue types, including the eye. This implies that cis and or trans elements may be influencing variability in expression and in turn, contribute to disease states in patients [148]. This may also result in higherthan-expected allele frequencies for pathogenic variants located in these genes in "healthy" control databases, such as gnomAD [149].
Several IRD genes have been found to harbour pathogenic splice mutations. Splice mutations are associated with variable severity as the percentage of wild-type transcript that can still be produced is dependent on the specific mutation in question [150]. It is therefore plausible that some variants may only have marginal disruptive effects on normal splicing and result in a sub-clinical phenotype. It is important that each of these variants is evaluated to determine aberrant effect(s). They can be assessed by a variety of means. The most readily available method to assess variants involves use of plasmid transfection to recapitulate the patient's mutation in mammalian cells. Minigene and Midigene constructs incorporating exons and introns for the IRD gene and including the candidate splice mutation have been hugely successful in providing empirical evidence of the functional effect of candidate splice variants [150][151][152][153]. Patient-derived cell and/or 3D retinal organoid models while more laborious to create, remove the need for transient expression of mutations as the mutation is already present in the patient's cells. Organoids provide a more retina-like simulated environment to more accurately evaluate splice processing and thus have been utilised for the interpretation of multiple IRD variants [154][155][156].
The RPGR gene is another example of an IRD gene associated with variable disease severity and age of onset. In the case of RPGR, this is particularly true of female carriers of pathogenic variants. Random X-inactivation plays a decisive role on the severity of the IRD manifestation. In a recent analysis, blood and saliva samples were taken from 77 female carriers from 41 RPGR-IRD pedigrees. These samples were analysed for their methylation patterns. It was found that X-inactivation ratios correlated with clinical severity and could be useful indicators for prognostic purposes [157]. Another common feature of RPGR pathogenesis is that missense variants often lack an ability to interact with other key IRD gene products which results in the failure of RPGR to correctly localise to the cilia as required [158]. Interestingly, this means that polymorphic variation in the other interacting IRD genes, such as IQCB1 and RPGRIP1L, can further modify the dynamics of this disrupted interaction [159].
Given that many IRDs involve progression over time, age is another modifier of disease. In several studies that have been mentioned previously, it was noted that diagnostic rates are higher in younger patients. In Germany, the mean age of participants with a positive molecular diagnosis was 39.8 years of age, whereas it was 46.3 years of age for unresolved patients [55]. In the UK, patients enrolled with late-onset macular disease (≥50 years of age) were less likely to receive a genetic diagnosis (18.0%) compared with patients with an onset of less than 50 years of age (24.2%) [73]. In the US, Stone et al. also noted that younger entrants to their screening study were more likely to have their disease-causing mutation identified [79]. These findings suggest that the genetic causes of early onset IRDs are better established and possibly that ageing increases susceptibility to polygenic, or perhaps non-genetic factors, making the older patient cohort more difficult to solve by current genetic screening approaches.

Impacts on IRD Diagnosis Outside of NGS Testing
The surge of NGS studies addressing diagnostic challenges in IRDs in the last decade has also encouraged further developments in other areas related to IRD diagnosis. Massive improvements have been made in ophthalmological imaging, not just in resolution, but also in applications and options available to imaging technicians [160]. These techniques help refine phenotypes and can inform whether or not more advanced forms of imagery may be useful for particular patients [161]. Machine learning (ML) is a branch of artificial intelligence that focuses on the concept that systems can learn from training data, such as expert-reviewed fundus images of IRD patients, and thereby learn to identify these patterns independently. This involves expert curation to obtain the training dataset but has the advantage of removing variability due to being a single assessor once training is complete. For example, many studies employ ML to assess fundus and optical coherence tomography images to train and predict outcomes for potential age-related macular degeneration (AMD) patients, including risk of progression to a more severe disease state and response to treatments [162][163][164][165][166][167][168][169]. This approach has enormous potential benefits for improvement of diagnostic and prognostic accuracy. For example, ML has been utilised for the detection of fleck lesions in fundus autofluorescence imaging, a characteristic feature of Stargardt disease. The ML approach could accurately identify and quantify fleck lesions, a potential outcome measure for future clinical trials [170]. Similarly, optic disc photography has been used to systematically detect abnormalities of the optic disc [171]. ML strategies may revolutionise the speed and accuracy at which new patients can be phenotyped in the clinic. More accurate phenotyping may have beneficial implications for NGS screening studies also, particularly those utilising phenotype-based gene panels. Improving the effectiveness of TS is likely to have the most widespread benefits given the relative low cost (less than $20 per sample), making TS the most readily accessible to diagnostic labs, especially those with constrictive budgets [79]. While significant advances in the clinical and genetic diagnosis of IRDs have been made, implementation of IRD screening is under resourced in many countries. A recent study into the cost-of-illness of IRDs in Ireland and the UK demonstrated that lack of access to genetic testing, the absence of an international patient registry and the issues regarding the reimbursement of therapies are common complexities faced by IRD patients in these countries [172]. In the UK, surveyed IRD patients noted that they appreciated pre-test counselling including a discussion of the possibility of an unexpected result [173]. Some countries are actively tackling the issue of integrating genetic services into future planning for their healthcare systems. Portugal and Iran are two of the latest countries to announce the launch of their nationwide registries for IRDs [174,175]. These registries aim to increase accessibility for individuals, while also providing a comprehensive dataset for investigators and clinicians to boost and develop their research. One such example is the French Plan for Genomic Medicine 2025 [176]. This plan aims to improve healthcare by organising and structuring new pathways to care and counselling while also reviewing the current challenges impeding the implementation of genomic testing in France. Despite the many great advances detailed in this review, the successful adaptation of IRD genomics services into national healthcare programs is rare [177]. Given that the technology and expertise is available and accessible to many research teams around the world (Table 1), combining the findings from these studies with effective national healthcare screening systems may enable better treatment and care through clinical genetic interventions for the patient [177].

Conclusions
We are now in the age of genomic medicine, where genetic evidence can often provide precise and robust diagnoses. This evidence may inform prognoses and may aid in directing care plans and therapeutic intervention. This growth has been supported by rapid enhancements in DNA sequencing methodologies, analysis tools and skillsets. The enormous impact of such advances is clearly exemplified in the field of IRDs. The true genetic complexity of IRDs is more fully appreciated with mutations in more than 300 genes implicated in IRDs among many other non-coding and modifying elements. It is clear that many challenges exist for genetic screening of IRDs, as evident from the approximate 40% portion of screened patients whose causative genetic elements are yet to be established. Many of these cases may be resolved in the future as more comprehensive techniques, such as WGS, are more routinely utilised. As such data becomes available, it will undoubtedly also increase our knowledge of genetic elements that have a modifying effect on IRD manifestations and severities. Genetic-screening diagnostic rates are likely to also benefit from advances in technologies related to clinical phenotyping that, where appropriate, may be augmented by machine-learning-based algorithms. Clearly another development that should enhance IRD screening endeavours is the development of national databases and genomics strategies to develop services and enhance clinical genetics collaborations nationally and internationally.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.