The Impact of Modern Technologies on Molecular Diagnostic Success Rates, with a Focus on Inherited Retinal Dystrophy and Hearing Loss

The identification of pathogenic variants in monogenic diseases has been of interest to researchers and clinicians for several decades. However, for inherited diseases with extremely high genetic heterogeneity, such as hearing loss and retinal dystrophies, establishing a molecular diagnosis requires an enormous effort. In this review, we use these two genetic conditions as examples to describe the initial molecular genetic identification approaches, as performed since the early 90s, and subsequent improvements and refinements introduced over the years. Next, the history of DNA sequencing from conventional Sanger sequencing to high-throughput massive parallel sequencing, a.k.a. next-generation sequencing, is outlined, including their advantages and limitations and their impact on identifying the remaining genetic defects. Moreover, the development of recent technologies, also coined “third-generation” sequencing, is reviewed, which holds the promise to overcome these limitations. Furthermore, we outline the importance and complexity of variant interpretation in clinical diagnostic settings concerning the massive number of different variants identified by these methods. Finally, we briefly mention the development of novel approaches such as optical mapping and multiomics, which can help to further identify genetic defects in the near future.


Introduction
In previous decades, different methods for disease gene identification have been established and successfully employed. All these technologies have significantly contributed to identifying the large number of genes that are associated with inherited forms of hearing loss (HL) (>150 genes) [1] and retinal dystrophy (RD) (>270 genes) [2]. HL is the most common sensory disorder, and it affects 466 million people worldwide [3]. The impact of HL is generally severe as it has profound consequences, especially on mental health, including anxiety, depression, and social isolation [4,5]. HL can be explained by both congenital or acquired causes and displays high clinical heterogeneity in the age of onset, progression, and severity, amongst others [6]. RD represents a group of clinically heterogeneous disorders that involves the death or dysfunction of photoreceptor cells in the retina. Collectively, RDs affect 2 million people worldwide [7]. Generally, three different types of RDs can be distinguished based on the primarily affected cell type: (1) the rod photoreceptors (e.g., retinitis pigmentosa or choroideremia), (2) the cone photoreceptors 2 of 27 (e.g., macular and/or cone dystrophies), and (3) more generalized types of RDs that involve both photoreceptor types (Leber congenital amaurosis, cone-rod dystrophies).
Especially for heterogeneous conditions such as RD and HL, the introduction of nextgeneration sequencing (NGS) techniques has led to the assumption that, soon, all HL-and RD-associated genes will be identified. Nevertheless, the diagnostic yield suggests a significant portion of missing heritability, which can potentially be explained by unrecognized disease genes or missed variants [8,9]. To provide a genetic diagnosis for all inherited cases, it has become evident that there is no single technique that can serve as the gold standard. To be able to detect and interpret all genetic variations of the human genome, classical methods such as linkage analysis or homozygosity mapping should be combined with novel state-of-the-art techniques [10,11].
The observed high genetic heterogeneity is not unique for these inherited sensory disorders; they have also been described for other inherited disorders, including intellectual disability, ciliopathies, and inherited susceptibility for cancer [12][13][14]. Although, in general, disease gene identification strategies applied in these fields rely on the same principles and have undergone a similar development, an optimal diagnostic strategy depends heavily on key factors such as evolutionary pressure and the involvement of multifactorial versus monogenic causes. For example, for intellectual disability, de novo causes are more frequent due to a strong reduction of reproductive fitness; this impacts the optimal diagnostic strategy. This review focuses on the identification of monogenic causes of inherited HL and RD.
In this review, we aim to provide an overview of the development of techniques that have enabled disease gene discovery throughout the years (Section 2). Additionally, we evaluate and highlight the complexity and different aspects of candidate variants and candidate gene interpretation (Section 3). Finally, we describe recent and upcoming improvements and innovations of existing technologies and the development of novel technologies in the field (Section 4).

Linkage Analysis
The first HL-and RD-associated genes were identified using linkage analysis and candidate gene strategies in the early 90s [15][16][17]. Examples of candidate gene approaches include analysis of candidate-disease-associated genes based on their function, gene expression, or animal model studies (discussed in [18]). Linkage analysis was used to pinpoint a genomic region of interest likely to encompass the disease gene. The strategy is based on the key principle that a disease haplotype is shared between affected individuals within a family but is not present in unaffected individuals. The shared haplotype cosegregates with the disease, according to the observed mode of inheritance. Initially, linkage regions were mapped using laborious genotyping of polymorphic microsatellite markers, but the process was optimized when microarray technologies became available. Microarrays, such as SNP arrays, allow rapid genotyping of thousands of single nucleotide polymorphisms (SNPs) that are present across the genome and have a variant allele frequency higher than 1% in the healthy population. The higher the density of the SNPs on the array and the more SNPs that reside within the region showing linkage disequilibrium, the more precise the determination of a possible disease haplotype is. The distance between two SNPs can be expressed in centimorgans (cM). One cM is defined as the distance between chromosomal positions that have a 1% chance of being separated by chromosomal recombination during meiosis. A logarithm of the odds (LOD) score can be calculated to estimate the odds that two loci, or a locus and a disease-associated gene, are located at an assumed distance from each other (expressed as the recombination fraction theta). A LOD score of 3.3 or higher is considered evidence for linkage in a genome-wide manner, with a probability of 95% [19]. Nowadays, several tools (e.g., GENEHUNTER [20] and PLINK [21]) are available to calculate the LOD score and identify a linkage region. However, large family pedigrees and sufficient participating family members are required to reach a statistically significant linkage. When a disease-associated locus is defined, Sanger sequencing can be performed to evaluate the genomic region for causative variants. In this way, the linkage analysis strategy has been applied very effectively for disease gene identification for many years (reviewed in [18,22]). Despite the introduction of higher-throughput sequencing techniques, SNP genotyping can still be very useful to determine regions of genotype-sharing even in small families, especially to reduce the number of candidate variants.

Homozygosity Mapping
Genome-wide homozygosity mapping has proven to be a powerful tool to identify disease-associated genes for autosomal recessive disorders. For both inherited HL and RD, a significant number of disease-associated genes were identified using this strategy [18,22]. In consanguineous families, a pathogenic variant is often present in a homozygous state as it is inherited from a recent common ancestor (grandparent or great-grandparent). Homozygosity mapping, which is often performed using SNP arrays, can be used to determine regions that contain consecutive homozygous variants [21,23]. Although the average size of homozygous stretches is larger in consanguineous families (typically between 30 to 100 megabase (Mb)-sized regions [24,25]), several studies have indicated that this method is also an effective tool for nonconsanguineous families (1-30 Mb-sized homozygous regions [24][25][26]). EYS is one of the most frequently mutated genes in RD, and it was identified using homozygosity mapping in a nonconsanguineous family [27,28]. Other examples of disease gene identification using homozygosity mapping in nonconsanguineous families include PDE6C [29], which is associated with RD, and OTOG [30] and MYO15A [31], which are implicated for HL. The size of a homozygous disease-associated haplotype decreases over subsequent generations due to meiotic recombination. Well-characterized families and detailed phenotypic information are prerequisites for the successful application of this technique.

Next-Generation Sequencing
DNA studies have been revolutionized by the conventional Sanger sequencing technique, which was introduced in 1977 [32]. It is known as an enzymatic sequencing or chain-termination method, which utilizes labeled di-deoxynucleotides, acting as chain terminators [32]. The first human genome was sequenced based on Sanger sequencing technology in 2001, which took almost 13 years to complete and cost USD2.7 billion, and was part of a large collaborative and international publicly funded project [33]. In parallel, efforts to sequence the first human genome were also performed in a commercial setting by the company Celera Corporation, whose results were revealed in joint publications with the public Human Genome Project [34,35]. The Celera project employed a wholegenome shotgun sequencing approach and proceeded at a much faster pace and lower cost, although it benefited significantly from the data that was already generated by the public Human Genome Project [34,36]. As a result of both efforts to sequence the human genome, it became clear that the scale, efficiency, and cost needed to be vastly optimized for routine use in clinical diagnostics. Therefore, shortly after the release of the human genome sequence, the aim was re-established to achieve a USD1,000 human genome within 10 years [37].
Sanger sequencing is still routinely used for variant validation and has an extremely high accuracy of up to 99.999% [38]. However, it is considered a low-throughput technique, as up to one kilobase (kb) of DNA can be sequenced in 96 or 384 parallel reactions [39]. The technique has been optimized by the application of nucleotide-specific fluorescent labels and automated detection [40,41], the invention of polymerase chain reaction (PCR) [42], and the usage of polyacrylamide gels in capillary electrophoresis [41]. Therefore, DNA sequencing can be achieved within a shorter time frame and on a larger scale, in which the sequencing of millions of reads can be carried out in parallel, called "massive parallel sequencing" or "NGS".
The NGS technique has rapidly overcome the limitations of traditional sequencing. Since 2005, various sequencing platforms, such as Illumina, Ion Torrent, Roche 454, and SOLiD sequencing, have been developed, which has resulted in a rapidly changing landscape during this new era of sequencing. The read length of these different platforms is shorter than that of Sanger sequencing (approximately 50-500 bp), with a higher error rate (0.1% in NGS compared to 0.001% in Sanger sequencing) [43]. However, the fast development of NGS techniques and the generation of public reference datasets containing population allele frequency data have allowed widespread integration of NGS technology in the research community and, later, in the clinical diagnostics of genetic diseases. Nevertheless, as whole-genome sequencing (WGS) is still relatively expensive and data interpretation is complex, a targeted sequencing approach (e.g., whole-exome sequencing (WES)), is often preferred.

Targeted Capture Sequencing
Genomic regions of interest, such as the genes implicated in HL or RD, can be selectively enriched before sequencing is performed. There are various methods available to enrich target regions, such as hybridization-based, highly multiplexed PCR-based, and targeted circularization-based approaches. Extensive studies have been performed, which have applied these techniques to unravel genetic defects involved in inherited HL and RD. In 2013, Chio et al. investigated 32 cases with familial nonsyndromic HL, in which they reached a molecular diagnostic rate of 37% using a candidate gene sequencing approach of GJB2, SLC26A4, POU3F4, or mitochondrial genes based on observed clinical features and inheritance patterns. Later, by application of hybridization-based target capture sequencing for 80 HL-associated genes, they were able to increase the total diagnostic detection rate to 78% in this cohort [44]. In 2017, Dockery et al. utilized the hybridization-based enrichment method to sequence 254 IRD-associated genes in over 750 affected individuals in Ireland, in which they could identify pathogenic variants in 68% of the cases [45]. A recent study by Khan et al. applied a highly multiplexed PCR-based approach, with single-molecule molecular inversion probes (smMIPs), to sequence the complete ABCA4 gene (coding and noncoding regions) in 1054 individuals with Stargardt disease (OMIM: 248200), who were previously screened for variants in the coding regions and remained genetically unexplained. This study proved that a smMIP-based approach is a cost-effective approach in the case of a strong genotype-phenotype correlation. The method allowed deep-sequencing of the region of interest, and causal structural and deep-intronic variants were identified in 25% of the investigated cases who were genetically undiagnosed after prescreening methods [46].
Targeted NGS techniques have several advantages, such as less data storage, high sequencing accuracy due to high coverage, and cost-and time effectivity [47]. However, this approach is unable to detect variants in novel (candidate) disease-associated genes. Furthermore, pathogenic variants residing in noncoding regions and structural variants (SVs) can be missed if only exons are analyzed. Due to decreasing prices of both WES and WGS, these approaches have become rapidly preferred to overcome the disadvantages of targeted NGS.

Whole-Exome Sequencing Versus Whole-Genome Sequencing
Protein coding regions comprise 1-2% of the human genome. However, it is estimated that they harbor approximately 85% of disease-causing variants [48][49][50]. Therefore, the enrichment of coding regions utilized in WES quickly became an accurate and efficient method to investigate the coding regions of the genome for potential pathogenic variants, and this is now widely applied in genetic diagnostics [51]. One of the striking features of WES is in the success rates of genetic diagnostics of diseases with extensive locus heterogeneity, such as inherited HL and RD [9,52]. Currently, the diagnostic yield for RD using WES is estimated to be between 50% and 80%, dependent on the phenotype studied [9,[53][54][55]. According to a study performed by Haer-Wigman et al., the highest yields were obtained for retinitis pigmentosa (63%), and the lowest yields were obtained for macular dystrophy (28%) and rare unspecified types of RD (25%) [9]. For HL, the genetic diagnostic rate is also highly dependent on phenotype (e.g., syndromic or nonsyndromic phenotype, mode of inheritance). The highest rates are observed in patients with a positive family history or with a congenital or symmetric type of HL [52]. The overall estimated detection rate for HL, when employing WES, varies between 30-40% based on different large-cohort studies and largely depends on phenotypic diversity [8,56]. The diagnostic yield for HL is importantly influenced by the involvement of environmental factors (e.g., noise, ototoxic drugs, and trauma), which likely explains the difference in yield compared to RD. Genetic causes have been estimated to underlie approximately two-thirds of the cases of congenital and early childhood HL; the remaining cases can be explained by acquired causes [57]. This genetic contribution decreases with the patient's age due to increased exposure to damaging environmental factors during life. In line with this observation, there are several reports of a negative correlation between diagnostic yield and age of onset of HL [8,52]. Despite the successes of WES in clinical settings, this technology is inaccurate in detecting SVs, such as a deletion of a single exon, and does not allow variant detection in deep-intronic regions or regulatory elements. Therefore, WGS may be preferred as it provides more evenly distributed and uniform read coverage and it is capable of detecting different types of variants across the entire genome [58][59][60][61]. In 2017, Carss et al. investigated a large cohort of RD patients, in which WGS was performed for 605 cases, WES for 72 cases, and, for 45 cases, both technologies were performed [61]. They identified disease-causing variants in 56% of all individuals (404/722), while, by using WES alone, the diagnostic yield was calculated to be 50%. Subsequently, 45/58 cases that remained unexplained by WES underwent WGS, and pathogenic variants were identified in 14 cases. The authors concluded that WGS has a great power to detect pathogenic SVs, variants in noncoding and regulatory regions, and variants in GC-rich regions. The application of WGS revealed the pathogenic variants in 31% of the cases that remained unsolved after WES. These variants were missed mainly due to the poor quality of reads or the incapability of WES to identify SVs [61]. The prices for WGS keep decreasing [62], and the importance of the noncoding regions of the genome has become more evident. Therefore, a shift from exome to genome sequencing will be observed in clinical diagnostics in the near future to overcome the diagnostic gap observed in the application of WES. In 2020, Méjécase et al. provided a practical and cost-effective guideline for current and future genetic testing of RDs, in which they proposed to utilize WES or targeted NGS for the initial screening of exons and flanking intronic regions of (candidate or known RD) genes, reserving WGS solely for cases that remained unresolved [63].
Although NGS techniques have revolutionized the field of medical genetics, these short-read sequencing (SRS) approaches pose several limitations, such as (1) difficulties in the identification of complex and large SVs, (2) inability to sequence repetitive regions, (3) the lack of phasing of alleles, and (4) difficulties in distinguishing highly homologous regions such as pseudogenes [64]. These limitations may play a significant role in the diagnostic gap in medical genetics.

Third-Generation Sequencing
Due to the limitations of the aforementioned NGS techniques, there has been a need to develop new sequencing approaches to overcome these issues. The era of third-generation sequencing arrived in 2011 when Pacific Biosciences (PacBio) released a novel sequencing technique called "single-molecule real-time" (SMRT) sequencing [65]; only three years later, Oxford Nanopore technologies introduced nanopore sequencing [66]. Although these two techniques utilize different approaches to sequence genomic DNA, they share two major advantages compared to NGS. First, they are established on PCR-free and real-time sequencing processes, and second, they generate ultra-long sequencing reads, >10 kb [64,67]. These longread sequencing (LRS) technologies are revolutionizing the genetics field as they provide a further understanding of the normal and morbid anatomy of the human genome and can, thereby, fill the gaps in the molecular diagnostics of genetic diseases. In the next step, adapters, DNA polymerase, and primers bind to the double-stranded DNA, creating the SMRT-bell, which will be loaded later to the SMRT-cell. (B) The library is randomly distributed in the SMRT-cell in the sequencer instrument, in the ideal condition one-third of the ZMWs will be loaded by an SMRT-bell. In each ZMW, the DNA polymerase together with an SMRT-bell are bound to the bottom of the ZMW. The SMRT sequencing uses the circular DNA template to generate a continuous long read in each ZMW chamber. Afterwards, the adapters are trimmed from this long read and overlapping reads can be combined to one consensus sequence of high quality called HiFi read.

Nanopore Sequencing
Nanopore sequencing is an advanced third-generation sequencing technique that offers straightforward sample preparation, requiring minimal reagents or amplification processes [75]. This technology relies on transferring a DNA molecule through a pore and directly detecting each nucleotide by its effect on an electric current (Box 2, Figure 2) [76,77]. In the next step, adapters, DNA polymerase, and primers bind to the double-stranded DNA, creating the SMRT-bell, which will be loaded later to the SMRT-cell. (B) The library is randomly distributed in the SMRT-cell in the sequencer instrument, in the ideal condition one-third of the ZMWs will be loaded by an SMRT-bell. In each ZMW, the DNA polymerase together with an SMRT-bell are bound to the bottom of the ZMW. The SMRT sequencing uses the circular DNA template to generate a continuous long read in each ZMW chamber. Afterwards, the adapters are trimmed from this long read and overlapping reads can be combined to one consensus sequence of high quality called HiFi read.

Box 1. Single-molecule real-time (SMRT) sequencing technique.
To enable the sequencing of single DNA molecules in real-time, two obstacles had to be overcome. First, concentrating the DNA polymerase and its template, the SMRT-bell ( Figure 1A), to the very small observation chambers, which creates a higher signal-to-noise ratio. This problem has been solved by zero-mode waveguide (ZMW) technology, a small hole of approximately 45 nanometers (nm) in diameter [74]. The DNA polymerase, with its template, is anchored by a strong biotin/streptavidin interaction to the bottom of the ZMW. Therefore, the laser illumination of incorporating nucleotides is limited to the bottom, which increases the signal-to-noise ratio [68], as ZMW can efficiently distinguish signals of nucleotide incorporation against the background of unincorporated nucleotides ( Figure 1B).
The second obstacle in real-time sequencing of single DNA molecules was the large size of the fluorescent dye, which interfered with the normal activity of DNA polymerase and caused the halting of the enzyme shortly after the initiation of DNA synthesis. In SMRT technology, the dye is attached to the phosphate chain instead of the nucleotide, which is naturally cleaved during DNA synthesis after nucleotide incorporation; this results in a single long, natural DNA strand [68].
The real-time sequencing of the circular SMRT-bell is performed in each ZMW, which generates continuous long reads ( Figure 1B). During data processing, the adaptors are removed, and subreads are generated. Subsequently, the combined subreads enable the generation of one highly accurate consensus sequence called the circular consensus sequence (CCS).

Nanopore Sequencing
Nanopore sequencing is an advanced third-generation sequencing technique that offers straightforward sample preparation, requiring minimal reagents or amplification processes [75]. This technology relies on transferring a DNA molecule through a pore and directly detecting each nucleotide by its effect on an electric current (Box 2, Figure 2) [76,77]. Box 2. Nanopore sequencing technique.
Nanopore sequencing occurs in a flow cell, in which two ionic solution compartments are separated by a membrane containing thousands of nanopores. The flow of electric current between these two compartments depends on the molecule transferring through one of the pores. Since each nucleotide differs in shape, its effect on the electric current is specific for each nucleotide ( Figure 2) [67,77,78]. Library preparation in nanopore sequencing includes the end-repair of ultralong dsDNA, the addition of dA-tails, followed by the size selection and ligation of adapters, which are protein-DNA molecules. The first adapter is the leader-adapter, which contains a motor enzyme. It binds to the nanopore and ensures the gradual movement of DNA through the pore. The dsDNA is then unwound at the pore, and only one strand will pass through the nanopore. The second adapter is a hairpin-adapter containing a hairpin protein. It generates one long single strand of DNA, which ensures the sequencing of the second strand of DNA in order to increase the accuracy of sequencing [66,67,79].
There is no limit in the length of DNA that can be sequenced with this technique since it does not require DNA amplification or synthesis. However, the challenge lies in library preparation, which needs to result in ultra-long dsDNA molecules [80]. The aver-age size of reads is usually above 10 kb, and for some ultra-long dsDNA molecules, it can reach one Mb [64]. The main drawback of nanopore sequencing technology is its relatively high error rate of~20%. Compared to SMRT technology, in which the error rate can be re-duced by high coverage due to CCS, in nanopore sequencing, it is a systematic error, and correction can only be achieved by comparison to short-read sequence data [80]. Never-theless, this technology is rapidly improving to overcome the current issues (such as error correction), and library preparation is being optimized to achieve high-quality ultra-long dsDNA [77].
molecules. The first adapter is the leader-adapter, which contains a motor enz binds to the nanopore and ensures the gradual movement of DNA through th The dsDNA is then unwound at the pore, and only one strand will pass thro nanopore. The second adapter is a hairpin-adapter containing a hairpin protein erates one long single strand of DNA, which ensures the sequencing of the strand of DNA in order to increase the accuracy of sequencing [66,67,79].

Figure 2.
Overview of the Nanopore sequencing technology. (A) The library preparation end-repairing, and adding dA-tails, followed by ligation of two types of adapters to both the ultra-long double-stranded DNA. The adapters carry the motor enzyme (in orange) a pin-protein (in green) to facilitate the movement of DNA through the nanopore and ensu sequencing of the second strand of DNA, respectively. (B) The library is loaded into the f in the sequencer instrument. The flow cell contains thousands of nanopores that allow th Cland K + ions between two compartments. The motor enzyme anchors to the nanopore winds the DNA to pass it through the pore. Thereby, the electric current is influenced ba unique shape of each nucleotide in single-strand DNA. These changes in the electric curr later translated to sequences.
There is no limit in the length of DNA that can be sequenced with this techni it does not require DNA amplification or synthesis. However, the challenge lies preparation, which needs to result in ultra-long dsDNA molecules [80]. The ave of reads is usually above 10 kb, and for some ultra-long dsDNA molecules, it c one Mb [64]. The main drawback of nanopore sequencing technology is its relativ error rate of ~20%. Compared to SMRT technology, in which the error rate can be by high coverage due to CCS, in nanopore sequencing, it is a systematic error, an tion can only be achieved by comparison to short-read sequence data [80]. Neve The library preparation includes end-repairing, and adding dA-tails, followed by ligation of two types of adapters to both ends of the ultra-long double-stranded DNA. The adapters carry the motor enzyme (in orange) and hairpin-protein (in green) to facilitate the movement of DNA through the nanopore and ensure the sequencing of the second strand of DNA, respectively. (B) The library is loaded into the flow cell in the sequencer instrument. The flow cell contains thousands of nanopores that allow the flow of Cl − and K + ions between two compartments. The motor enzyme anchors to the nanopore and unwinds the DNA to pass it through the pore. Thereby, the electric current is influenced based on the unique shape of each nucleotide in single-strand DNA. These changes in the electric current are later translated to sequences.

Application of Third-Generation Sequencing in Inherited HL and RD
Third-generation sequencing has revolutionized the field of medical genetics by its superior performance in the analysis of repeated and highly homologous regions, SVs, haplotype phasing, and transcriptome analysis [81]. These technologies are currently mainly used in research applications and show great promise to overcome the disadvantages of SRS methods. In a systematic analysis, Ebbert et al. compared the performance of whole-genome SRS and LRS technologies at repetitive regions in the human genome. Amongst others, they showed that 8.6% of the protein-coding regions of RPGR (associated with X-linked RD) and 12.7% of the protein-coding regions of OTOA (associated with HL) are within the unmapped reads of SRS data, which were resolved by performing LRS.
One important application has been to identify complex SVs associated with genetic diseases, including HL and RD. Reiner et al. utilized SMRT LRS to detect a 72.8-kb deletion region in the BBS9 gene and map the breakpoints at the nucleotide level in a patient diagnosed with Bardet-Biedl syndrome (OMIM: 615986). This deletion was determined to be the causal variant and a founder mutation in the Guyanese population [83].
In another recent study, researchers utilized transcriptome sequencing, followed by short-and long-read WGS, to identify a 7.4-kb duplication in NMNAT1, which spans two out of five exons of this gene. The duplication caused a previously unrecognized autosomal recessive syndrome, symptoms of which are Leber congenital amaurosis and sensorineural HL, which occur together with other features such as spondyloepiphyseal dysplasia, intellectual disability, and brain anomalies. The authors were able to determine the exact breakpoints of the duplicated region, missed by previous approaches, as well as two Alu elements flanking this segment, which are potentially involved in the origin of the SV [84].
Recently, nanopore sequencing enabled researchers to unravel the genetic defect in two unrelated patients diagnosed with mild-to-moderate HL. Nanopore sequencing revealed a gene conversion event between the OTOA gene and its pseudogene, in which exons 21 to 23 of OTOA were replaced by exons 1 to 3 of OTOAP1 [85]. As pathogenic variants within the OTOA gene have been described to cause autosomal recessive HL (DFNB22; OMIM: 607039), this gene conversion event was considered causative [85].
Despite the advantages of LRS techniques, they possess multiple important drawbacks that prevent a wide range of uses outside research applications. One of these is the relatively high costs compared to SRS NGS technologies (USD800-2000 per run, depending on the different platforms and instruments), based on the lowest possible flow cell price and highest output [67]. The other major disadvantage is the requirement for high-quality ultralong dsDNA, which can be challenging to obtain. In particular, for nanopore sequencing, the required fresh blood samples for DNA extraction can also be a hurdle. However, as LRS technologies are rapidly decreasing in price and are continuously improving in different aspects, such as optimized library preparation and error correction, it is expected that these technologies will eventually enter routine genetic diagnostics in Western countries. In addition, like SRS, targeted LRS can also be performed by targeted amplicon sequencing, CRISPR/Cas-based targeted enrichment, or using a "Read Until" approach in order to enrich for genetic loci associated with a specific phenotype. Targeted LRS is a cost-effective and efficient strategy to investigate high-priority loci in unsolved cases [86,87]. For both HL and RD, several associated genetic loci (44 and 36 loci, respectively) have been described, for which the implicated genetic defect is still elusive [1,2], and, therefore, a targeted LRS approach could be of interest.
Finally, as sequencing technologies develop and improve rapidly (Figure 3), the next challenge will lie in bioinformatics, data storage, data analysis, and variant interpretation of NGS or LRS data. A high number of different variants are revealed by these methods. However, not all these variants are disease-causing. Therefore, special attention is being paid to prioritization processes that can aid in decreasing the number of putative candidate variants. In addition, developments in bioinformatic tools are needed to better interpret the effect of candidate variants. In the next section, we will discuss the importance and challenges of variant interpretation and the importance of this matter in clinical application. challenge will lie in bioinformatics, data storage, data analysis, and variant interpretation of NGS or LRS data. A high number of different variants are revealed by these methods. However, not all these variants are disease-causing. Therefore, special attention is being paid to prioritization processes that can aid in decreasing the number of putative candidate variants. In addition, developments in bioinformatic tools are needed to better interpret the effect of candidate variants. In the next section, we will discuss the importance and challenges of variant interpretation and the importance of this matter in clinical application. Figure 3. Comparison of conventional Sanger, next-generation, and third-generation sequencing. (A) schematic representation of A) first generation sequencing (Sanger sequencing), (B) next generation sequencing (e.g. Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES)) and (C) third-generation sequencing (e.g. SMRT sequencing as performed by Pacific Biosciences (PacBio) and nanopore sequencing by Oxford Nanopore Technologies (ONT)). For each technique, advantages (green) and disadvantages (red) are provided.

Variant Interpretation
The total length of human DNA is over 3 billion base pairs, and it holds, on average, 4-5 million variants compared to the healthy human reference genome, which highlights the obvious challenge of distinguishing potential disease-causing variants from benign (A) schematic representation of (A)) first generation sequencing (Sanger sequencing), (B) next generation sequencing (e.g. Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES)) and (C) third-generation sequencing (e.g. SMRT sequencing as performed by Pacific Biosciences (PacBio) and nanopore sequencing by Oxford Nanopore Technologies (ONT)). For each technique, advantages (green) and disadvantages (red) are provided.

Variant Interpretation
The total length of human DNA is over 3 billion base pairs, and it holds, on average, 4-5 million variants compared to the healthy human reference genome, which highlights the obvious challenge of distinguishing potential disease-causing variants from benign variants or polymorphisms [88]. For protein-truncating variants, a potential pathogenic consequence is often evident, while missense, synonymous and noncoding variants are more challenging to interpret. Moreover, with increased knowledge regarding the involvement of noncoding DNA in human disease development, the complexity of data to be analyzed has gone through the roof.
In 2015, the American College of Medical Genetics (ACMG) provided a framework to utilize and standardize sequence variant interpretation for Mendelian disorders [89]. Each variant is categorized using a uniform scoring system: benign, likely benign, uncertain significance, likely pathogenic, or pathogenic. The classification system employs several hierarchical steps, which include the use of literature and databases, computational and predictive data, functional data, and segregation analysis. Variant classification is the cornerstone of clinical molecular genetic testing. Therefore, ACMG guidelines provide a consistent and well-applicable system to guide this process. On the other hand, for research focused on the identification of novel gene-disease associations, the ACMG guidelines are more difficult to apply and less suitable.

Literature and Database Use
Variant frequency databases are a useful resource for allele frequencies of variants in large populations. As a rule of thumb, the frequency of a disease-causing variant should not be higher than expected, based on the incidence or prevalence of the genetic disorder [90]. The most comprehensive allele frequency database today is gnomAD (successor of ExAC), which contains frequency data for both SNVs and SVs based on 91,864 genomes and 125,748 exomes [91]. Additionally, this database provides variant frequencies for many subpopulations, which allows the usage of population-matched control data. Nevertheless, some populations (e.g., African/African-American) remain underrepresented, which limits efforts in precision and personalized medicine for these ethnicities. Several efforts are ongoing to obtain more (high-quality) genomes from these populations [92,93]. Databases such as gnomAD [91], goNL [94], UK10K [95], and Wellderly [96] contain sequencing data of (presumably) healthy cohorts. However, important caveats related to age-of-onset and reduced penetrance should not be ignored.
Unlike population databases, disease databases such as ClinGen [97], ClinVar [98], Leiden Open (source) Variation Databases (LOVDs) [99], and the Human Gene Mutation Database (HGMD) [100] also provide genotype-phenotype information. All the variants collected in the HGMD database have been reported in patients and likely disease-causing. They have been published in the literature and manually curated. The Deafness Variation Database (DVD) provides a comprehensive catalog for genetic variation in genes associated with HL [101]. Efforts are ongoing to collect and annotate all published variants associated with inherited nonsyndromic RDs, Bardet-Biedl syndrome, and Usher syndrome into 195 gene-specific LOVDs [28,[102][103][104][105][106].
Several studies have proven the value of incorporating population frequency data as a variant prioritization strategy and have successfully clarified variants of unknown significance [61,107]. However, an important caveat is that a reliable database should be frequently updated, and uploaded sequencing data should adhere to quality control criteria. An example of non-pathogenic variants mistakenly reported as pathogenic has been highlighted in a study performed by Hanany et al. [108]. The authors extracted upto-date allele frequencies from gnomAD of variants in genes associated with dominantly inherited RD and concluded that the pathogenicity of variants in 19% of these genes should be debated. Inherited HL, on the other hand, is a more common condition, than RD and therefore the expected maximum allele frequency for a pathogenic variant should be adjusted accordingly [109].
Once a potentially disease-causing variant is identified, a rich source of available scientific and medical literature can be assessed. A first important step entails thorough comparisons between the observed phenotype in the investigated proband and phenotypic observations described in the literature. Most well-described phenotype-genotype correlations can be found in data repositories: Online Mendelian Inheritance in Man (OMIM) [110], ClinGen [97], and ClinVar [98].
Strong phenotype-genotype correlations are complicated by a phenomenon called allelism-different phenotypes can result from different alleles of the same gene [111]. For example, autosomal recessive Stargardt disease (STGD1), which is due to two variants or alleles in ABCA4, shows a wide clinical spectrum of maculopathies [112]. The most severe form is early-onset (onset <10 years) STGD1 or panretinal cone-rod dystrophy, which is due to two deleterious ABCA4 alleles. Classical or intermediate STGD1 (onset between 10 and 40 years) is due to a combination of a deleterious variant and a mild variant. Finally, late-onset STGD1 (onset >40 years) is caused by a deleterious variant and a mild variant (p.Asn1868Ile), showing reduced penetrance [112][113][114]. Truncating variants in the CDH23 gene are assumed to cause Usher syndrome type 1D (OMIM: 601067), which consists of HL and retinitis pigmentosa, whereas missense variants cause nonsyndromic HL (OMIM: 605516) [115]. However, several exceptions to this rule have been reported [116,117]. For pathogenic variants in the USH2A gene that can cause both nonsyndromic retinitis pigmentosa and Usher syndrome type IIa (OMIM: 276901), the correlation of missense and truncating variants with the associated phenotypic expression is not always clear, although truncating USH2A variants are more frequently reported in patients diagnosed with a syndromic phenotype [118,119]. Additionally, variants affecting genes that are implicated in ciliopathies (e.g., BBS1, CEP290, IQCB1) can cause a wide range of variable symptoms that are part of a (syndromic) phenotype. Symptoms described for ciliopathies often include retinal degeneration and, less frequently, HL (reviewed in [14]). To date, >80 forms of syndromic RD have been described, which are linked to approximately 200 IRD-associated genes [120]; for syndromic HL, these numbers are suggested to be even higher [121].
Besides a phenotypic resemblance, the expected mode of inheritance and the involved pathogenic mechanism of the variant (e.g., haploinsufficiency, loss-or gain-of-function) should also be compared with literature reports. For genes that have not been previously associated with the disease of interest, OMIM [110] and GeneCards [122] provide a summary of known clinical and functional information for the gene. For candidate disease genes, it may be valuable to investigate gene expression in the tissue of interest (e.g., SHIELD [123], gEAR [124], EyeGEx [125]) and explore associated protein interaction networks (e.g., STRING [126]). Additionally, the initiative Genematcher [127] and the European Retinal Disease Consortium (ERDC) [128] offer the opportunity for different research groups that share an interest in the same candidate disease gene to collaborate. It is hypothesized that the most prominent genetic causes of diseases have been identified, and novel findings appear in few cases or families, which underlines the urgency for collaborations among research groups worldwide. It is of utmost importance to share candidate disease gene data to increase the likelihood of identifying multiple unrelated individuals affected by pathogenic variants in the same gene [129,130].

Computational and Predictive Data
The spectrum of human genetic variation is diverse, and a rich source of bioinformatics tools has been developed to evaluate the different potential consequences of a variant. Although the pathogenicity of SNVs has been most extensively studied, recent efforts into the characterization of SVs have revealed that pathogenic SVs are more abundant than initially thought [90,131]. This has led to a gradual shift of attention from coding variations to structural variations and the noncoding regions of the genome.

Null Variants
Null variants are considered very strong evidence of pathogenicity and often lead to open reading frame disruption and, consequently, the complete loss of protein function. Null variants include nonsense, frameshift, canonical splice site, and initiation codon variants, as well as out-of-frame single-and multiexon deletions. Available in silico prediction tools are often not designed for the interpretation of null variants, as pathogenicity already seems evident in most cases. However, some caveats should be considered, including the presence of alternative transcripts, the position of the variant with respect to 3'UTR, and the inducement of alternative splicing such as in-frame exon skipping as a putative correction mechanism [132][133][134]. For each gene, a loss-of-function intolerance (pLI) score, which is based on observed (homozygous) loss-of-function variants in healthy cohorts compared to the expected number based on the gene size, is provided in gnomAD [91].

Missense, Synonymous, Indel, and Intronic Variants
Substitution variants located in the coding (exonic) or noncoding (intronic) regions of a gene are more difficult to interpret. Missense variants and small in-frame insertions or deletions (indels) lead to changes in amino acid composition. Several computational tools have been developed to aid in the assessment of deleterious consequences of the identified variants. Output scores provided by these tools are usually based on evolutionary conservation of the altered nucleotide or amino acid residues, biochemical consequences of the amino acid change, or the location and context of the residue within the protein sequence, e.g., in a domain with a specific function. The most widely applied tools are combined annotation-dependent depletion (CADD) [135], Grantham [136], MutationTaster [137], PhyloP [138], PolyPhen-2 [139], and sorting intolerant from tolerant (SIFT) [140].
Alternatively, synonymous, missense, and (deep)-intronic variants can disrupt the normal splicing machinery and alter pre-mRNA processing. Variants can introduce or strengthen cryptic splice sites, disrupt canonical donor or acceptor splice sites, or disrupt the (binding) motifs that are essential for correct splicing processes, such as exonic splicing enhancers or silencers [107]. This can lead to alternative splicing events, such as pseudo-exon inclusion, exon elongation, or (partial) exon skipping. Potential splice-altering variants can be evaluated based on nucleotide conservation scores or by performing splicing assessments using predictive splicing algorithms, such as Human Splicing Finder [141], SpliceSiteFinder-like [142], MaxEntSCan [143], GeneSplicer [144], NNSPLICE [145], and SpliceAI, a deep learning algorithm [146]. In vitro midi-or minigene splice assays can be performed to confirm the predicted alternative splicing events in HEK293T cells or, if transcript levels allow, aberrant splicing can be detected in RNA derived from (EBVtransformed) blood cells [147,148].
One pitfall of splice site prediction tools is tissue-specific splicing of exons, which prevents most current prediction tools from detecting cochlear-or retina-specific splicing effects. Recently, Riepe et al. benchmarked different established and deep-learning tools on sets of variants in tissue-specific genes ABCA4 and MYBPC3 and observed that SpliceAI is the best performing splice prediction tool for both noncanonical splice sites and deepintronic variants in ABCA4 [149]. Moreover, Rowlands et al. compared seven machine and deep learning-based splice prediction tools and demonstrated that SpliceAI is superior in both sensitivity and specificity [150].

Regulatory Variants
Variants located in intergenic and intronic regions of the genome can exert their pathogenic effect through a variety of mechanisms. Variation can occur within characterized cis-regulatory elements (CREs), such as promoters, enhancers, or insulators [151,152]. Regulatory elements are short DNA sequences (100-500 bp) that allow precise spatiotemporal control of gene expression levels [151]. Promoter and distant enhancer regions interact with each other via chromosomal looping, allowing the recruitment of transcriptional machinery. Alternatively, insulators can block the interactions between promoters and enhancers. An enhancer element can be located up to one Mb away from its target gene and can serve as the transcriptional regulator of one or more genes [151,[153][154][155]. Usually, an enhancer displays a spatiotemporal pattern of activity. Transcription factors that bind enhancer or promoter elements are the key regulators of these processes, and they modulate gene expression. Pathogenic variants in cis-regulatory elements can alter transcription factor binding sites or the chromatin landscape and, therefore, the activity of the enhancer or promoter [151,152]. Databases such as JASPAR [156], which contain consensus sequences of transcription factor binding sites, can be used to predict the effect of a potential regulatory variant on transcription factor binding.
Regulatory variants that impact transcription initiation usually lead to subtle changes in gene expression and are difficult to assess [152]. Therefore, context-specific profiling of the tissue-and cell-type-specific cis-regulatory architecture is essential [157]. Enhancer databases such as the ENCODE portal [158], GeneHancer [159], and EnhancerAtlas [160] contain an overview of reported cis-regulatory elements that are widespread throughout the genome. Potential interactions between promoter and enhancer elements can be assessed by evaluating available chromosome conformation capture data like Hi-C. Additionally, the presence of context-specific active enhancer hallmarks should be assessed. These include (1) the confirmed binding of transcription factors, (2) the production of noncoding enhancer RNA, (3) an open chromatin conformation, and (4) the presence of histonemodification marks that are associated with enhancer activity, such as histone 3 lysine 27 acetylation [151,157]. Figure 4 provides an overview of these hallmarks, the suitable techniques to assess these, and a selection of relevant publicly available (epigenetic) datasets used to interrogate the recently resolved autosomal dominant retinitis pigmentosa RP17 locus [161]. Once a candidate regulatory variant has been identified, experiments such as an in vitro luciferase reporter assay could be applied to confirm its effect on enhancer or promoter activity [151].   [151]. (E) Lastly, the enhancer element is enriched for histone-modification marks that are associated with enhancer activity, such as H3K27Ac, as determined using ChIP-sequencing performed on human retina (GSE137311) [151].

Structural Variants
SVs are defined as genomic rearrangements that are larger than 50 bp [131]. SVs include deletions and duplications, also referred to as copy number variations, as well as To evaluate a potential regulatory variant, several publicly available datasets in the UCSC browser can be assessed to determine whether a variant is located within an active cis-regulatory element based on the presence of active enhancer or promoter hallmarks [162]. (A) A known active retinal enhancer element, described by de Bruijn et al. and included in the GeneHancer database, is visualized in the UCSC browser [159,161]. The enhancer element overlaps with a long noncoding gene, LINC01476, and is predicted to bind to the promoter region of the YPEL2 gene (GeneHancer) [159]. The enhancer element is enriched for several active enhancer hallmarks. (B) Firstly, the element is bound by retina-specific transcription factors (NRL and CRX), as confirmed by ChIP-sequencing performed on a human retina sample (GEO database: GSE137311) [151]. (C) Secondly, cap analysis gene expression (CAGE) allows 5 end sequencing of cDNAs, confirming the expression of the enhancer element, as shown by the FANTOM5 CAGE human dataset (Data available from https://fantom.gsc.riken.jp/data/) (accessed on March, 2021) [162]. (D) Thirdly, an open chromatin conformation of the enhancer element is confirmed by ATAC-sequencing of a human retina sample (GSE137311) [151]. (E) Lastly, the enhancer element is enriched for histone-modification marks that are associated with enhancer activity, such as H3K27Ac, as determined using ChIP-sequencing performed on human retina (GSE137311) [151].

Structural Variants
SVs are defined as genomic rearrangements that are larger than 50 bp [131]. SVs include deletions and duplications, also referred to as copy number variations, as well as inversions, translocations, and insertions [153]. In 2020, an amendment of the ACMG guidelines was published to aid in the classification of SVs [163]. SVs can have direct consequences on gene dosage levels when (partially) overlapping with coding regions of a gene or can cause changes in gene expression levels or patterns when overlapping with regulatory elements such as enhancers. Additionally, SVs that are limited to the noncoding regions of the DNA can interfere with the 3D genome structure and disrupt cisregulatory architecture [131]. Each chromosome is compartmentalized in regulatory units, so-called topologically associating domains (TADs). Within each TAD, enhancers and gene promoters can interact. Neighboring TADs are shielded from each other by boundaries, which are typically occupied by the transcription factor CTCF [164]. Disruption of TAD architecture by SVs can have severe pathogenic consequences. Deletions can lead to the fusion of neighboring TADs, inversions can result in the exchange of regulatory sequences, and duplications can generate novel TAD compartments, leading to ectopic enhancerpromoter contacts (neo-TADs) [153,[165][166][167]. These genomic rearrangements can result in pathogenic alterations of gene expression levels. Recently, it was shown that TAD rearrangements caused by SVs are an important cause of autosomal dominant retinitis pigmentosa (RP17) [161]. Additional studies, focused on the identification of copy number variants involved in HL or RD, have also suggested a prominent role for pathogenic SVs [168,169]. To predict the potential consequences of structural rearrangements, the epigenetic landscape of the region, including the presence of CTCF sites, interactions, and directionality, should be evaluated.

Segregation Analysis
Once a candidate disease-causing variant is identified, segregation analysis should be performed, if possible, to confirm that the observed inheritance of the variant matches the family history. If a variant is segregating with the phenotype within the family, this could serve as supportive evidence for linkage of the identified variant to the disorder. However, the variant might still be in linkage equilibrium with the true pathogenic variant, and the genetic locus should always be carefully screened for missed variants. Additionally, a careful clinical evaluation of all family members is essential to exclude mild symptoms of reportedly unaffected individuals as well as possible phenocopies, whose phenotype can be explained by other (nongenetic) factors. The latter is especially relevant for cases diagnosed with inherited HL, as both genetic and environmental factors are significant contributors to the development of HL [3].
Other factors that might complicate the interpretation of segregation analysis results are age-related or reduced penetrance, modifiers, carrier females in X-linked diseases, and multigenic inheritance. Several studies have indicated that modifying variants can have higher allele frequencies than fully penetrant alleles and, therefore, are not recognized by diagnostic pipelines [170,171]. Despite their high allele frequencies, it has been shown that these variants can still significantly modify Mendelian genotypes. For instance, the variants p.(Ser192Tyr) and p.(Arg402Gln) in TYR have an individual allele frequency of 36.4% and 27.3% in the gnomAD database (non-Finnish Europeans), respectively, while the p.[Ser192Tyr;Arg402Gln] allele has an allele frequency of 1.9%. Despite the relatively high population frequency, the pathogenicity of the p.[Ser192Tyr;Arg402Gln] allele has been suggested when present in a homozygous state or in a triallelic genotype with a known pathogenic TYR variant in trans [172,173]. Studies suggest that one in six genes implicated in RD is possibly associated with variable penetrance due to variability in expression levels [174,175]. Examples of strong evidence for variants with reduced penetrance, implicated in RD or HL, have been reported for ABCA4 [113,176], COCH [177], PRPF31 [178], and RIPOR2 [179].
Another complicating factor is uniparental disomy (UPD), where two homologous chromosomes are inherited from the same parent due to errors during meiosis. In 2020, Yauy et al. investigated the presence of UPD in exome sequencing data of 4912 trios [180]. The authors detected UDPs in 0.05-0.2% of these trios, amongst which was a chromosome 1 UPD (ABCA4) in a Stargardt disease case, suggesting minimal contribution to the genetic diagnostic yield [180]. Thus far, there are four reported Stargardt disease cases showing UPD in chromosome 1 [46,[180][181][182]. Moreover, in 2013, Roosing et al. described maternal UPD of chromosome 6, which included a pathogenic TULP1 variant responsible for the cone dystrophy phenotype of the proband. For HL, several cases of UPD have been described as well, affecting chromosome 1 (USH2A) [183], chromosome 13 (GJB2) [182], and chromosome 18 (LOXHD1) [182,184].

Functional Evaluation of Variants
Functional assays can provide an extra line of evidence that can aid in the discrimination between (likely) pathogenic variants, (likely) benign variants, or variants with unknown significance. For proteins with a well-characterized subcellular localization or function, in vitro approaches can be considered to assess the impact of the variant on protein localization or function. Examples of the latter are assessments of transporter function, enzymatic activity, or activity of metabolic pathways. In vivo experiments are ideal for studying the true biological context. However, as it is not always feasible to perform such studies, in vitro research can, instead, provide valuable information. Biochemical data obtained from patient-derived biopsies might be more informative. However, for both HL and RD, samples derived from the tissues of interest are usually not available. For these purposes, animal models could provide a valuable alternative. Over the years, several studies have proven the suitability of studying ear-or eye-related disease in nonhuman primates and mouse models [38,185]. The International Mouse Phenotyping Consortium (IMPC) aims to generate mouse knockout models for all known genes in the mouse genome [186]. Furthermore, the zebrafish has proven its suitability as an animal model. In this model, retinal and inner ear function can already be studied five days postfertilization [185,187]. Limitations in the usage of animal models include ethical, time, and financial considerations, in addition to the level of gene conservation.
Stem cell technology and the development of differentiation protocols over the past decades have enabled the in vitro generation of patient-derived cells, resembling retinal photoreceptors or inner ear hair cells [188,189]. These models can provide an alternative method of studying the tissue of interest. Research has shown that differentiated cells can resemble the patient's retina or inner ear. Several 2D-and 3D-differentiation protocols have been successfully applied to study both HL and RD. Differentiation approaches are rapidly being optimized, as the involved processes are still very time-consuming and expensive [188,189]. More so, variability and cell heterogeneity are important hurdles, and these should be overcome in order to fully replace animal model studies.

Development of New Technologies
Chromosomal abnormalities and SVs are among the main causes of genetic diseases, which are being addressed in clinical application using routine cytogenetics methods such as karyotyping and fluorescent in-situ hybridization (FISH), comparative genomic hybridization (CGH), and SNP microarrays [190,191]. However, these methods manifest significant limitations in the identification of SVs. For example, karyotyping allows the identification of different chromosomal abnormalities with a 5-10 Mb resolution. Although microarrays and CGH arrays are able to identify the gain and loss of chromosomal material as small as 10 kb, balanced rearrangements cannot be detected by these methods nor the exact location of the structural variation [192,193]. It is estimated that only 15-20% of chromosomal abnormalities can be detected by the application of these techniques, which indicates the great need for new technologies in the field of cytogenetics [194].
Although LRS techniques are rapidly developing and show a great ability to identify SVs, their routine application in clinical diagnostics still requires several improvements in terms of sequencing and variant interpretation; it also requires a cost reduction. In addition, despite the fact that these technologies can provide substantial read length, the reads can only be assembled to the scaffold level and not to the chromosome level [195]. Complementary approaches to identify SVs can be offered by cytogenetics [193]. One of these recent technologies is optical mapping (Bionano Genomics), which is a de novo assembly-based method that allows the visualization of the genomic structure in high resolution [196]. The approach is based on ultra-long dsDNA molecules that are fluorescently labeled at CTTAAG hexanucleotide motifs, which are found, on average, 15 times per 100 kb across the human genome. The distances and patterns of these labels can be compared to those in a reference genome. Therefore, copy number aberrations and other SVs, including insertions, inversions, and translocations, can be detected ( Figure 5). those in a reference genome. Therefore, copy number aberrations and other SVs, including insertions, inversions, and translocations, can be detected ( Figure 5). Optical mapping has a much higher resolution compared to standard karyotyping and microarray technologies and, therefore, enables much more precise data analysis. As it is an imaging method and not a sequencing method, SNVs cannot be detected. However, for the analysis of SVs, optical mapping can be used in a complementary manner to sequencing techniques [193]. With the ability to map ultra-long dsDNA molecules at a low cost, optical mapping has facilitated SV detection, haplotype phasing, and genome assembly [195]. In a recent study, researchers utilized optical mapping to identify a 48-kb duplication at the LAMA1 locus that causes Poretti-Boltshauser syndrome (OMIM: 615960). Affected individuals present with ataxia, cognitive impairment, and language delay, as well as ocular phenotypes such as ocular motor apraxia, abnormal eye movement, and RD. WES and chromosome microarray prescreening methods failed to reveal the large SV in the studied family [197]. The authors reasoned that LRS technologies offer promising applications in comprehensive SV analysis; however, the costs and accuracy may represent a burden. Therefore, they suggested that a combination of different technologies, such as optical mapping and SRS, provides a more comprehensive understanding of SVs when considering cost, time, and throughput [197]. The labeled DNA is linearized in order to take images of the label patterns in the DNA molecules, and, subsequently, the images are converted to the molecules. (C) These molecules are then utilized for genome assembly to generate consensus genome maps. The pattern of labels can be compared between the reference genome and the affected individuals to identify structural variants. The shorter or longer distance between two labels indicates deletion or insertion, respectively. Translocations can be identified by the mapping of a single region in the patient genome to two genomic regions in the reference. The inverted pattern of labels in a patient, compared to that in the reference genome, indicates the presence of an inversion.

Multiomic Approaches
Besides genome sequencing, other omic technologies, such as transcriptomics, proteomics, metabolomics, or epigenomics, hold the promise to further close the diagnostic gap for RD and HL. It is evident that for each identified disease-associated gene, the isoform landscape and levels of involved gene regulation are more complex than initially thought. A quantitative (gene expression levels) or qualitative (isoform structures, novel exons) analysis of the transcriptomic landscape is valuable in enhancing diagnostic yield, as shown by several studies [198,199]. In combination with genome sequencing, RNA sequencing can improve the interpretation of variants with unknown significance, although inaccessibility of cell types for RD-and HL-associated genes is a major limitation. The labeled DNA is linearized in order to take images of the label patterns in the DNA molecules, and, subsequently, the images are converted to the molecules. (C) These molecules are then utilized for genome assembly to generate consensus genome maps. The pattern of labels can be compared between the reference genome and the affected individuals to identify structural variants. The shorter or longer distance between two labels indicates deletion or insertion, respectively. Translocations can be identified by the mapping of a single region in the patient genome to two genomic regions in the reference. The inverted pattern of labels in a patient, compared to that in the reference genome, indicates the presence of an inversion.
Optical mapping has a much higher resolution compared to standard karyotyping and microarray technologies and, therefore, enables much more precise data analysis. As it is an imaging method and not a sequencing method, SNVs cannot be detected. However, for the analysis of SVs, optical mapping can be used in a complementary manner to sequencing techniques [193]. With the ability to map ultra-long dsDNA molecules at a low cost, optical mapping has facilitated SV detection, haplotype phasing, and genome assembly [195]. In a recent study, researchers utilized optical mapping to identify a 48-kb duplication at the LAMA1 locus that causes Poretti-Boltshauser syndrome (OMIM: 615960). Affected individuals present with ataxia, cognitive impairment, and language delay, as well as ocular phenotypes such as ocular motor apraxia, abnormal eye movement, and RD. WES and chromosome microarray prescreening methods failed to reveal the large SV in the studied family [197]. The authors reasoned that LRS technologies offer promising applications in comprehensive SV analysis; however, the costs and accuracy may represent a burden. Therefore, they suggested that a combination of different technologies, such as optical mapping and SRS, provides a more comprehensive understanding of SVs when considering cost, time, and throughput [197].

Multiomic Approaches
Besides genome sequencing, other omic technologies, such as transcriptomics, proteomics, metabolomics, or epigenomics, hold the promise to further close the diagnostic gap for RD and HL. It is evident that for each identified disease-associated gene, the isoform landscape and levels of involved gene regulation are more complex than initially thought. A quantitative (gene expression levels) or qualitative (isoform structures, novel exons) analysis of the transcriptomic landscape is valuable in enhancing diagnostic yield, as shown by several studies [198,199]. In combination with genome sequencing, RNA sequencing can improve the interpretation of variants with unknown significance, although inaccessibility of cell types for RD-and HL-associated genes is a major limitation.
LRS offers the potential for RNA analysis as well: for example, the Iso-Seq method of PacBio enables the sequencing of full transcripts, and nanopore sequencing offers direct sequencing of RNA molecules [64]. LRS techniques have already shown to be successful in the identification of novel full-length transcripts. In a study performed by Ray et al., an abundant retina-specific CRB1 transcript (CRB1-B) was detected, which was not annotated in genome databases such as the UCSC genome browser [200,201]. The authors showed that the expression of the CRB1-B transcript is significantly higher in photoreceptors than the canonical CRB1 transcript (CRB1-A). The newly identified transcript includes unique exons that are not present in CRB1-A and, thereby, represent important candidate regions for potentially missed pathogenic variants [201]. In addition, developments in the singlecell RNA sequencing field allow the identification and characterization of tissue-specific isoforms and regulatory events. The Single Cell Portal (Broad Institute) offers a valuable resource of tissue-specific single-cell RNA sequencing datasets [202].
Epigenomics is an emerging and promising development in the field of medical genetics. Analysis of epigenomic signatures can aid in the understanding of the 3D organization of the genome. Since base modifications remain captured in native DNA molecules that are used for SMRT and nanopore sequencing, investigation of the methylome and DNA base modification is possible [64,67]. Ideally, multiomic layers (e.g., genomics, transcriptomics, and epigenomics) should be integrated (the so-called multiomics), which aids in an ultimate understanding of the genomic landscape and provides valuable insights for (candidate) disease-associated genes.

Conclusions and Discussion
Fifty years after the arrival of the Sanger sequencing technique, the sequencing technology landscape is still rapidly evolving. However, genetic diagnostic yield still varies between 40-70% for inherited HL and RD, indicating that there are still opportunities for further improvement [8,9,52]. Although novel disease-associated genes are being discovered, disease-gene identification curves are slowly reaching a plateau phase, suggesting more attention should be paid to currently missed or misinterpreted variants within known HL-or RD-associated genes that reside within complex (noncoding) regions of the genome. Recent developments of LRS techniques and optical mapping and improvements in WGS techniques offer valuable opportunities to investigate the noncoding landscape of the genome in more detail. Furthermore, the interpretation of SVs has been greatly advanced by developments in computational analysis and bioinformatics tools. Therefore, the emphasis will be on overcoming the limitations of sequencing and bioinformatic techniques in the near future. Additionally, evidence suggests that complex factors, such as modifiers, digenic inheritance, and variable penetrance, play an important role in disease-causing mechanisms in inherited HL and RD. The generation of larger, high-quality datasets will allow a better understanding of these events as well.
We foresee that, in the near future, the new technologies and improved analytical tools will reinforce the clinical diagnostic setting in order to close diagnostic gaps, as it is of utmost importance for both the affected individuals and the involved clinicians and researchers. It will help to provide guidance to affected families with regard to family planning, providing them with an optimal prognosis and counseling. In addition, with recent developments in the field of genetic therapies, the importance of genetic diagnostics can no longer be underestimated. We have come a long way from linkage analysis, starting in the early 90s, to the more recent LRS of single DNA molecules to unravel the genetic causes of HL and RD. Clinical diagnostics has significantly improved over these years, and the diagnostic yield is still increasing. We anticipate an extensive application of new technologies in the future, which will redirect traditional therapies towards precision or personalized medicine to improve treatments for HL and RD.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.