Next-Generation Molecular Investigations in Lysosomal Diseases: Clinical Integration of a Comprehensive Targeted Panel

Diagnosis of lysosomal disorders (LDs) may be hampered by their clinical heterogeneity, phenotypic overlap, and variable age at onset. Conventional biological diagnostic procedures are based on a series of sequential investigations and require multiple sampling. Early diagnosis may allow for timely treatment and prevent clinical complications. In order to improve LDs diagnosis, we developed a capture-based next generation sequencing (NGS) panel allowing the detection of single nucleotide variants (SNVs), small insertions and deletions, and copy number variants (CNVs) in 51 genes related to LDs. The design of the LD panel covered at least coding regions, promoter region, and flanking intronic sequences for 51 genes. The validation of this panel consisted in testing 21 well-characterized samples and evaluating analytical and diagnostic performance metrics. Bioinformatics pipelines have been validated for SNVs, indels and CNVs. The clinical output of this panel was tested in five novel cases. This capture-based NGS panel provides an average coverage depth of 474× which allows the detection of SNVs and CNVs in one comprehensive assay. All the targeted regions were covered above the minimum required depth of 30×. To illustrate the clinical utility, five novel cases have been sequenced using this panel and the identified variants have been confirmed using Sanger sequencing or quantitative multiplex PCR of short fluorescent fragments (QMPSF). The application of NGS as first-line approach to analyze suspected LD cases may speed up the identification of alterations in LD-associated genes. NGS approaches combined with bioinformatics analyses, are a useful and cost-effective tool for identifying the causative variations in LDs.


Introduction
The lysosome is an intracellular organelle characterized by its acidic pH, and its main function consists in degradation of intra or extracellular macromolecules into monomers. This metabolic process is carried out by more than fifty lysosomal enzymes. Additionally, over a hundred structural proteins and carriers essential for lysosomal function have been identified [1]. "Lysosomal storage disorders" (LSD) was the conventional term used to describe the group of inborn errors of metabolism (IEMs) related to the absence or failure of substrate degradation or transport, and their subsequent accumulation in the lysosome [2]. However, in recent years, the lysosome is being viewed as a dynamic structure with multiple roles in nutrient sensing, autophagy, apoptosis, and cellular response to environmental cues. It is also a signaling hub that interacts with other organelles [3]. In this context, the chosen term has shifted to lysosomal disorders (LDs) instead of LSD to better reflect the complexity of these diseases. In LDs, the inheritance pattern is autosomal recessive except for three disorders (Fabry, Danon, and Hunter diseases) which are X-linked. Clinical presentations of LDs vary greatly, and age at onset ranges from the antenatal period all the way to adulthood. However, in some cases, cardinal signs may steer clinical physicians towards a particular disorder, such as specific dysmorphic features, ocular or articular involvement, organomegaly, multiple dysostosis, valvulopathy, neurological defects or psychomotor delay. An early diagnosis allows an appropriate medical care, as many specific treatments have recently been developed, and thus reduces morbidity [4,5]. Currently, biological diagnosis relies on a three-phase process: (i) characterization of accumulated metabolites, (ii) enzyme activity assessment, and (iii) molecular investigations. Additionally, in some cases, molecular study as first-line exploration is mandatory to reach the diagnosis. For instance, in case of X-linked pathologies such as Fabry disease, the measurement of enzyme activity may fail to identify heterozygous females due to X inactivation process. Besides, in some autosomal disorders, such as most of neuronal ceroid lipofuscinosis (NCL), no biological tests are available and molecular approaches are the only diagnostic option.
The rise of "omics-based" approaches and the tremendous technological shift, in both multiscale biological information capture and data management, offer a remarkable opportunity to change the ways we screen, diagnose, treat, and monitor inherited metabolic diseases [5][6][7]. Next generation sequencing (NGS) technologies represent an essential tool for rapid and effective diagnosis of these diseases and may be used in some complex situations prior to multiple and often sequential functional studies. Recent studies highlighted the clinical utility of NGS approach for LD genetic diagnosis [8][9][10][11]. Here we report on the design, validation and testing of an NGS panel for genes involved in LDs named LysoGene.

Patients
Twenty-one well-characterized LD patients have been included for validation purposes (Supplementary Table S1). Twenty-seven disease-causing variations and 50 benign variations have been previously identified by Sanger sequencing and were used for validation of the single nucleotide variants (SNVs) and small insertions/deletions (indels) sequencing process and the bioinformatics pipeline (Supplementary Tables S1 and S2). To illustrate the clinical utility of this panel, five LD patients are reported.
Case 1: A female child presented at 3 months of age with severe organomegaly (hepatomegaly at 6 cm and splenomegaly at 9 cm), associated with severe malnutrition, without diarrhea. No dysmorphy was noted. The liver biopsy was in favor of a storage disease.
Case 2: This female child was born at term from a non-consanguineous couple, eutrophic after a normal pregnancy, and with a good adaptation to extra-uterine life. At the age of two and a half years old, she presented with a speech delay and a flat tympanogram and transtympanic ventilation tube was inserted. At 3 years old, she was hospitalized for seizures with predominantly right occipital spikes on the electroencephalogram (EEG) wake and sleep patterns. A second episode of seizures induced by hyperthermia occurred a few months later. She had a disturbed sleep pattern with repeated awakenings, agitation and crying, sensory dysregulation including severe agitation and intolerance to loud noises, and poor communication. Brain MRI showed a retrocerebellar arachnoid cyst and cerebellar atrophy. Based on these elements, late infantile neuronal ceroid lipofuscinosis (CLN2, CLN5, CLN6 or CLN7) was suspected.
Case 3: This was the third child of a couple, born prematurely at 35 weeks of gestation by caesarean section for abnormal fetal heart rhythm. She was hospitalized at 3 months of age for psychomotor regression with decrease of focus and ocular following of objects and persons, as well as axial hypotonia. High blood pressure was diagnosed in the emergency department, and the child was put on calcium channel blocker. The MRI and the EEG showed no anomalies. A cherry red macula was found on ophthalmological examination. A LysoGene panel was requested.
Case 4: The patient was the second child of healthy non-consanguineous parents. Pregnancy was without particularity with a birth weight of 2830 g, a birth length of 47 cm and a head circumference of 34 cm. He was hospitalized in the neonatal intensive care for amniotic fluid aspiration associated with patent ductus arteriosus and suspicion of neonatal infection. This child acquired walking at around 12 months old, day and night cleanliness at 4 years old. At two and a half years old, he was treated for bilateral serous otitis media revealed by a hoarse voice and difficulties understanding. At three years old, he did not pronounce words properly and only formed simple sentences. He had a behavioral disorder with aggressiveness, concentration difficulties and disabling headaches. At 5 years old, he had a height and weight at + 1SD and presented with signs of storage such as square face, skin thickening, and enlarged joints and bone. At the metabolic level, elevated urinary excretion of heparan sulfate and a decreased activity in Heparan-alphaglucosaminide N-acetyltransferase were consistent with Sanfilippo type C (Mucopolysaccharidosis type IIIC) diagnosis. The HGSNAT gene was analyzed using Sanger sequencing and two pathogenic variants were identified in the heterozygous state: a splicing variant (NM_152419.2:c.234+1G>A-p.?) resulting in a modification of the exon 2 splicing, and a missense variant NM_152419.2:c.710C>A-p.(Pro237Gln). Both variants are reported in the Human Gene Mutation Database (HGMD) and have been published [12]. However, allelic segregation analysis showed that both variants were inherited from the mother who was clinically healthy. Of note, the DNA sample from the father was not available to us. We decided to investigate this case using the LysoGene panel to unveil the alteration inherited from the father.
Case 5: A 31-year-old patient presented with diffuse myalgia. He had progressive exercise intolerance during the last 5 years. He also suffered from sleep apnea. The patient had been hospitalized several times and underwent many explorations without any diagnosis having been reached. Classical neuromuscular work up was normal, including electromyogram (EMG) and creatine phosphokinase (CPK).
Written informed consents were obtained from the parents when the patient is under 18 or from the adult patient in order to perform any investigation related to their pathology.

NGS Sequencing
DNA extraction: for NGS analysis, blood genomic DNAs were extracted using a silicamembrane-based DNA purification method (QIAamp DNA Blood Mini Kit, QIAGEN). NGS sequencing was performed in the IRIB-Rouen University Hospital Facility (Service Commun de Génomique).
Gene panel design: our approach aimed to capture, and sequence 51 genes implicated in LD (Table 1, Supplementary Table S3). Five additional genes were included for identity monitoring of patients (CCDC88C, NIPBL, MLH1, APC, PTEN). The design of the LysoGene panel covered the coding regions, the promoter region and the flanking intronic sequences for 43 genes. In addition, 3 untranslated sequences were included for 2 genes (AGA and ARSA), and the entire gene sequences were covered for 6 genes (ARSB, CLN3, CLN8, IDS, SGSH, and NAGLU). In total, 708 regions were targeted including 506 exonic regions.
Custom primers were designed using the SureDesign software (Agilent Technologies, Santa Clara, CA, USA).
Library preparation and sequencing: the library preparation protocol was set up using the QXT SureSelect enrichment kit from Agilent. Library construction was done using enzymatic fragmentation and the SureSelectQXT kit (Agilent Technologies, Santa Clara, CA, USA) to capture targeted sequences. Patients' libraries were pooled after the enrichment step. The protocol was either performed manually or automated on a Sciclone NGSx workstation (PerkinElmer, Waltham, MA, USA). Libraries were sequenced on a MiSeq or a NextSeq 500 platform (Illumina, San Diego, CA, USA) using 2 × 150 bp paired-end sequencing.
Bioinformatics pipelines: for the detection of SNVs, indels and copy number variants (CNVs), a double bioinformatics pipeline was used with complementary algorithms in order to optimize the disease-causing variant detection rate: For each sequencing run, PDF quality reports integrating the number of clusters/mm 2 , percentage of bases with a Qscore > 30, FastQC reports, percentage of mapped, reads, onand off-targets percentages, percentage of covered bases and mean sequencing depth were automatically generated using the in-house tool PyQua (Python Qualitics).
The control of the sample identity was performed using a multiplex SNaPshot analysis comparing five SNPs located within the captured regions of 5 genes unrelated to LDs included in the panel. To validate the panel in a diagnostic context, analytical accuracy, intra-assay and inter-assay reproducibility were assessed.

Quality Metrics
The NGS assay provided an average read depth of 474×. This deep coverage allowed for simultaneous detection of SNVs and CNVs in one comprehensive analysis. All the targeted regions were covered above the minimum depth required of 30×.

Panel Performances for the Detection of SNVs and Indels
Accuracy: The concordance between this panel results and the reference data was 100% for all 77 variants. Thus, the detection of these variants has been achieved with 100% analytic sensitivity.
Intra-and inter-assay reproducibility: the ratios between the values obtained for all metrics measured in the samples used for intra-and interassay reproducibility tests were equal or close to 1 (Supplementary Tables S4 and S5) demonstrating the consistency of the results.

Panel Performances for the Detection of CNVs
For CNVs, the performances of the in-house bioinformatics tool, CANOES, for assessing the read depth from capture-based NGS data were evaluated. The validation of this workflow has been published recently and highlighted very high sensitivity and positive predictive value for NGS gene panels [27].

Clinical Utility Assessment
To illustrate the clinical utility of this panel, we report 5 cases in which the NGS approach proved to be significantly more efficient than traditional Sanger sequencing. All the variants identified through the NGS workflow have been confirmed using Sanger sequencing (SNVs and indels) or quantitative multiplex PCR of short fluorescent fragments-QMPSF (CNVs).
Case 1: The LysoGene panel enabled the characterization of 2 pathogenic heterozygous variants in NPC2 gene. The variant NM_006432.3:c.58G>T-p.(Glu20 *) has been reported in HGMD and has been published [28]. The second frameshift variant, c.87delp.(Val30Trpfs*5) is novel. The presence of these variants was consistent with the diagnosis of Niemann Pick C type 2 disease. Sanger sequencing of NPC2 in the parents confirmed allelic segregation.
Case 2: The analysis of the neuronal lipofuscinosis ceroid sub-panel allowed the characterization of two pathogenic heterozygous variants in the TPP1 gene in this patient. Both variants, NM_000391.3:c.196C>T-p.(Gln66 *) and c.622C>T; p.(Arg208 *), have been reported in HGMD and previously published [29,30]. Allelic segregation was confirmed by the study of the parents' DNA.
Case 3: Given the clinical picture, priority was given to the analysis of genes involved in pathologies with macular cherry-red spots (Figure 1). Two pathogenic variants were identified in HEXB, NM_000521.3:c.1165dup-p.(Gln389Profs*22) which has never been described before, and c.1417+5G>A-p.? predicted to abolish the splicing donor site [31]. Enzymatic activities of hexosaminidase A and total hexosaminidases were greatly reduced in leukocytes and plasma. All these results pointed to Sandhoff disease.
Case 4: NGS sequencing of HGSNAT gene succeeded in retrieving the variants inherited from the mother (NM_152419.2:c.234+1G>A-p.? and c.710C>A-p.(Pro237Gln)) and enabled the identification of a heterozygous deletion of exon 15 (NM_152419.2:c.(1464+1_1465-1)_(1542+1_1543-1)del-p.?) which is carried by the paternal allele. This finding made it possible to confirm on a molecular basis the diagnosis of Sanfilippo type C in this patient.
Case 5: Rapid GAA gene sequencing using the LysoGene panel enabled the characterization of two pathogenic heterozygous variants: NM_000152.2:c.-32-13T>G-p.? in intron 1 which has previously been reported in adult form of Pompe disease [32], and c.2238G>Cp.(Trp746Cys) in exon 16 [33]. Sanger sequencing of the parents' DNA confirmed allelic segregation. Metabolic work up showed a reduced acid maltase activity.

Discussion
Diagnostic difficulties in LDs arise from the wide clinical, biochemical and molecular heterogeneity observed in these pathologies and highlight the crucial need of multidisciplinary collaboration for the diagnosis and management of these diseases [34,35]. LDs, like other IEMs, are primarily due to monogenic alteration, but a large number of genetic and environmental factors modulate their phenotypic expression and underlie the wide range of clinical severity associated with LDs. This concept has been extended to connect IEMs to common diseases as part of a metabolic disease spectrum. All these pathologies imply necessarily several genes and represent a continuum. Indeed, in IEMs, the influence of one gene is dominant and in common diseases an equivalent contribution of several gene alterations might be observed [36]. In addition, some LDs display phenotypic overlaps that often lead to misdiagnosis. Testing several hypotheses sequentially may result in a delay or failure to succeed in reaching the diagnosis. Of note, some lysosomal hydrolases may have reduced in vitro activity in clinically healthy individuals, referred to as pseudodeficiency. A set of variants known to cause pseudodeficiency has been characterized in the sequences of the corresponding genes that leads to an in vitro instability of the enzyme while the enzyme remains functionally active in vivo [37].
To smooth out and speed up LD screening and diagnosis, a paradigm shift is urgently needed to move from hypothesis-driven to data-driven strategies. Omics approaches along with bioinformatics tools offer a great opportunity to establish a validated workflow enabling the assessment of a large panel of diseases. Subsequently, targeted approach technologies may be used to confirm the identified abnormalities.
Here, we describe the analytical validation of an NGS-based sequencing panel encompassing 51 genes implicated in LDs. The assay demonstrated a high sensitivity and reliability and was efficient in characterizing both variants involving a small number of nucleotides (SNVs/indels) and large-scale rearrangements (CNVs). By multiplexing patient samples and several genes on a single platform, the limitations related to Sanger sequencing were addressed. This approach allowed for both lowering the costs and enhancing the diagnostic effectiveness. Recent studies reported NGS-based analyses in LD genetic diagnosis [8][9][10][11]. CNV detection was reported in only one study that included 28 LD genes [11]. Of note, the present work enabled the analysis of CNVs, not reachable by Sanger sequencing, for all the included 51 LD-related genes. This markedly broadens the scope of this panel for LD genetic investigations.
To illustrate the clinical integration of our panel, we reported 5 LD patients for which NGS analysis provided with fast and accurate results.
The NGS panel allowed us to guide the diagnosis toward of Niemann-Pick type C in Case 1, Sandhoff disease in Case 3 and Pompe disease in Case 5 while the clinical pictures were unspecific. In Case 2, the clinical presentation was suggestive of a ceroid lipofuscinosis. A fast molecular diagnosis was critical as a clinical trial for TPP1 deficiency based on intraventricular enzyme replacement therapy was ongoing. To be efficient, this treatment had to be implemented before psychomotor regression [38]. NGS analysis helped in identifying pathogenic variants in TPP1 gene and the patient was successfully included in the ongoing clinical trial. The clinical utility of simultaneous CNV characterization is exemplified in Case 4. Indeed, the NGS workflow allowed the retrieval of the SNVs located on the maternal allele as well as the characterization of a CNV inherited from the father. Thus, NGS approach enabled the confirmation of this diagnosis on a molecular basis.

Conclusions
Clinical heterogeneity, phenotypic overlap, and variable age at onset are still major hurdles for fast and effective diagnosis of LDs. Combining NGS-based technology capabilities with efficient bioinformatics workflows offer a promising opportunity to enhance LD characterization through high throughput molecular profiling. Two main driving diagnosis situations stand out: (i) in typical clinical presentation, targeted biochemical profiling is the gold standard informative way to go with a subsequent molecular confirmation; (ii) in challenging clinical situation, first-tier NGS-based molecular profiling seems to be more informative to parse the clinical puzzle. In addition, conventional biochemical profiling confirmation is strongly recommended whenever possible.

Data Availability Statement:
All the data that support the findings are presented in the manuscript and the Supplementary Materials.