Identification of Clinical Variants beyond the Exome in Inborn Errors of Metabolism

Inborn errors of metabolism (IEM) constitute a huge group of rare diseases affecting 1 in every 1000 newborns. Next-generation sequencing has transformed the diagnosis of IEM, leading to its proposed use as a second-tier technology for confirming cases detected by clinical/biochemical studies or newborn screening. The diagnosis rate is, however, still not 100%. This paper reports the use of a personalized multi-omics (metabolomic, genomic and transcriptomic) pipeline plus functional genomics to aid in the genetic diagnosis of six unsolved cases, with a clinical and/or biochemical diagnosis of galactosemia, mucopolysaccharidosis type I (MPS I), maple syrup urine disease (MSUD), hyperphenylalaninemia (HPA), citrullinemia, or urea cycle deficiency. Eight novel variants in six genes were identified: six (four of them deep intronic) located in GALE, IDUA, PTS, ASS1 and OTC, all affecting the splicing process, and two located in the promoters of IDUA and PTS, thus affecting these genes’ expression. All the new variants were subjected to functional analysis to verify their pathogenic effects. This work underscores how the combination of different omics technologies and functional analysis can solve elusive cases in clinical practice.


Introduction
The diagnosis of Mendelian diseases has been transformed by next-generation sequencing (NGS). In the past, determining the genetic cause of a patient's condition was a laborious task; genome sequencing has made it much easier [1].

Exomic Studies
After clinical/biochemical diagnosis, exomic studies were performed on blood-extracted DNA via NGS, using either CES or WES. In P1, a previously described [19] likely pathogenic variant (c.284G>A) in the maternal allele of GALE was identified. In P2, a novel splicing variant (c.1524+2T>A) was detected in IDUA, predicted to be pathogenic according to the American College of Medical Genetics and Genomics (ACMG) criteria. Segregation studies confirmed that the variant was in the paternal allele. In P3, a previously described [20] likely pathogenic variant (c.827T>G) was detected in the maternal allele of DBT. No candidate variants were detected in P4 and P5. Mean coverage analysis suggested there were no deletions in any gene. After these studies, all patients remained with an inconclusive genetic diagnosis (Table 2).

Transcriptomic Studies
RNA-Seq was used to complete the diagnosis of P1, P2, P3, P4 and P5. An insufficient amount of RNA prevented from performing RNA-Seq on P6. The analysis was targeted to specific genes based on biochemical or enzymatic findings. For patients with a previously identified pathogenic variant, the expression of one allele was compared against the other. P1, P2 and P3 showed an AEI of the variants identified in GALE, IDUA or DBT, respectively ( Figure 1a). For P4 and P5, in which no candidate variants had been identified, the expression analyses of the genes associated with HPA and citrullinemia were analyzed, respectively. No expression changes were observed for the genes associated with HPA in P4 compared to healthy controls ( Figure 1b). For P5, a reduction in ASS1 mRNA expression was identified ( Figure 1b); this was confirmed by retro-transcriptase quantitative PCR (RT-qPCR), which revealed an 84% reduction in ASS1 transcripts compared to healthy controls ( Figure 1c).
Aberrant transcripts were also studied. RNA-Seq identified an insertion of 97 bp between exons 1 and 2 of PTS (one of the genes associated with HPA) in P4 in approximately 50% of reads ( Figure 1d). The inserted region was a fragment of the intron found between exons 1 and 2. Since the pseudo-exon changes the reading frame of PTS, this mRNA should be degraded by nonsense-mediated decay (NMD) and, therefore, should be under-detected compared to the normal allele. As the number of reads of both alleles was similar, normal PTS expression was assessed by RT-qPCR; a 66% reduction in normal transcripts was seen in P4 compared to healthy controls (Figure 1c). An expression defect in this allele is therefore likely.

Genomic Studies
In order to identify the genetic causes of the expression or splicing defects detected by RNA-Seq, a hierarchical pipeline strategy was followed in which the proximal promoter and the 3' UTR were first sequenced. The variants found were prioritized according to their minor allele frequency (MAF), their zygosity, and disease segregation. For P1, the variant c.-77G>C in the last nucleotide of exon 1 (non-coding) in GALE was prioritized; for P2, the variant c.-87T>C in IDUA was selected; for P4, the variant c.-82_-71delins-103_-86 detected in PTS was chosen. All these variants were present in heterozygosis. The variants found in P1 and P2 were in trans ( Figure S1) with the pathogenic changes previously detected in DNA studies. All these variants are novel and classified as variants of uncertain significance (VUS).
In P4, in which a pseudo-exon insertion was identified, the intronic sequences flanking the inserted region were amplified and sequenced. This allowed the identification and prioritization of the variants c.83+658C>G and c.83+758T>A in PTS in P4 (both in the maternal allele and in trans with c.-82_-71delins-103_-86) ( Figure S1). In P3 and P5, in which an AEI and reduced expression were detected, respectively, by RNA-Seq, WGS allowed the identification and prioritization of a deep intronic change in both patients. In P3, we identified the previously described [21,22] pathogenic variant c.1018-550A>G in DBT in heterozygosis and in trans with the variant identified by CES ( Figure S1). In P5, the genomic sequencing identified and prioritized the novel change c.598-757G>A in homozygosis in ASS1. Both variants presumably cause a pseudo-exon insertion. Reevaluation of RNA-Seq data showed an insertion between exons 8 and 9 of DBT in P3 (never detected before by specific cDNA analysis), and between the same exons of ASS1 in P5 (Figure 1d). Both pseudo-exons were mapped in a small number of reads (two for the pseudo-exon found in P3 and seven for the one identified in P5), explaining why they escaped earlier analysis.
In P6, a novel, de novo, deep intronic variant (c.541-277A>G) was identified in heterozygosis in OTC, classified by ACMG as a VUS. In silico studies suggested an increase in the strength of an existing donor splice site (5 GT; analyzed by Alamut Visual Plus Software v1.4).

Specific cDNA Analysis
After bioinformatics analysis, the effect of all novel variants suggestive of causing a splicing defect on cellular cDNA was next analyzed. RT-PCR was performed on fibroblastextracted RNA from P1, P4 and P5, and from liver biopsy-extracted RNA from P6. Specific primers for GALE, PTS, ASS1 and OTC were used, respectively. The pathogenic role of all four variants was then confirmed. The c.-77G>C in GALE led to an insertion of 110 nucleotides between exons 1 and 2 in P1 (not present in controls) ( Figure 2a). This insertion was not detected via the RNA-Seq data, probably due to the reduced presence of this transcript in P1, and the existence of several transcripts of GALE without this exon. In P4, the pseudo-exon insertion detected by RNA-Seq was confirmed by specific cDNA analysis, which revealed a pseudo-exon insertion of 97 bp between exons 1 and 2 of PTS that was not present in healthy controls (Figure 2b). In the re-evaluation of the RNA-Seq data using Integrative Genomics Viewer (IGV) software v2.8.13, a pseudo-exon insertion was observed between exons 8 and 9 of ASS1 in P5 (Figure 2c). It was also present in the control population but at a much lower level since variant c.598-757G>A increases the strength of the recognition of a pre-existing donor splice site. Finally, in P6, the amplification of a cDNA fragment containing exons 5 and 6 revealed a 57 bp pseudo-exon insertion flanked by the variant at its 3 end (Figure 2d). Although this variant was not inherited from the parents, the study of a single nucleotide polymorphism present in the pseudo-exon identified its linkage to the paternal allele ( Figure S1). The HUMARA test could not be performed on liver biopsy samples, but in the blood samples, it revealed a random inactivation of the X chromosome. 110 nucleotides between exons 1 and 2 in P1 (not present in controls) ( Figure 2a). This insertion was not detected via the RNA-Seq data, probably due to the reduced presence of this transcript in P1, and the existence of several transcripts of GALE without this exon. In P4, the pseudo-exon insertion detected by RNA-Seq was confirmed by specific cDNA analysis, which revealed a pseudo-exon insertion of 97 bp between exons 1 and 2 of PTS that was not present in healthy controls ( Figure 2b). In the re-evaluation of the RNA-Seq data using Integrative Genomics Viewer (IGV) software v2.8.13, a pseudo-exon insertion was observed between exons 8 and 9 of ASS1 in P5 ( Figure 2c). It was also present in the control population but at a much lower level since variant c.598-757G>A increases the strength of the recognition of a pre-existing donor splice site. Finally, in P6, the amplification of a cDNA fragment containing exons 5 and 6 revealed a 57 bp pseudo-exon insertion flanked by the variant at its 3′ end (Figure 2d). Although this variant was not inherited from the parents, the study of a single nucleotide polymorphism present in the pseudoexon identified its linkage to the paternal allele ( Figure S1). The HUMARA test could not be performed on liver biopsy samples, but in the blood samples, it revealed a random inactivation of the X chromosome. Together, these results suggest that the variants c.-77G>C, c.[83+658C>G;83+758T>A], c.598-757G>A and c.541-277A>G found in GALE, PTS, ASS1 or OTC, respectively, are responsible for the splicing defects found in these genes in P1, P4, P5 and P6, respectively.  Together, these results suggest that the variants c.-77G>C, c.[83+658C>G;83+758T>A], c.598-757G>A and c.541-277A>G found in GALE, PTS, ASS1 or OTC, respectively, are responsible for the splicing defects found in these genes in P1, P4, P5 and P6, respectively.

Minigene Analysis
To isolate the new splicing variants from their genomic context, and therefore confirm them to be responsible for the cDNA defects seen, c.-77G>C (GALE), c.83+658C>G and c.83+758T>A (PTS), c.598-757G>A (ASS1) and c.541-277A>G (OTC) minigene studies were performed. The results suggest that the variant found in GALE leads to aberrant splicing that produces a fragment without the cloned region (Figure 3a). The combination of both variants found in PTS, as well as the variants found in ASS1 and OTC, leads to an aberrant inclusion of the pseudo-exons previously found in the cDNA of P4, P5 and P6 (Figure 3b-d).

Minigene Analysis
To isolate the new splicing variants from their genomic context, and therefore confirm them to be responsible for the cDNA defects seen, c.-77G>C (GALE), c.83+658C>G and c.83+758T>A (PTS), c.598-757G>A (ASS1) and c.541-277A>G (OTC) minigene studies were performed. The results suggest that the variant found in GALE leads to aberrant splicing that produces a fragment without the cloned region (Figure 3a). The combination of both variants found in PTS, as well as the variants found in ASS1 and OTC, leads to an aberrant inclusion of the pseudo-exons previously found in the cDNA of P4, P5 and P6 (Figure 3b-d). These results thus suggest that the variants c.-77G>C, c.[83+658C>G; 83+758T>A], c.598-757G>A and c.541-277A>G found in GALE, PTS, ASS1 and OTC, respectively, affect the splicing process, strongly implying them to be pathogenic.

Gene Expression Analysis by Luciferase Reporter Assay
To study the potential effect of c.-87T>C and c.-82_-71delins-103_-86 on IDUA and PTS expression, and to rule out a potential effect of c.-77G>C on the promoter of GALE, a firefly luciferase-coding sequence was cloned under the promoter of the three genes. The amplified regions were selected by attending to different databases (see Materials and Methods). Two proximal promoter regions were identified for GALE and PTS, and one for IDUA (Figure 4a). In all cases, luciferase was expressed, but an 87% reduction was recorded in the context of the mutant IDUA promoter compared to the wild-type promoter, and a 59% reduction in the context of the mutant PTS promoter; no differences

Gene Expression Analysis by Luciferase Reporter Assay
To study the potential effect of c.-87T>C and c.-82_-71delins-103_-86 on IDUA and PTS expression, and to rule out a potential effect of c.-77G>C on the promoter of GALE, a firefly luciferase-coding sequence was cloned under the promoter of the three genes. The amplified regions were selected by attending to different databases (see Materials and Methods). Two proximal promoter regions were identified for GALE and PTS, and one for IDUA (Figure 4a). In all cases, luciferase was expressed, but an 87% reduction was recorded in the context of the mutant IDUA promoter compared to the wild-type promoter, and a 59% reduction in the context of the mutant PTS promoter; no differences were seen for the contexts of the mutant and wild-type GALE promoters (Figure 4b). These results indicate that the variants found in the 5 UTR of IDUA (c.-87T>C) and PTS (c.-82_-71delins-103_-86) in P2 and P4, respectively, are probably responsible for the expression defects seen. In contrast, the variant found in P1 in GALE (c.-77G>C) does not appear to affect the gene promoter.
Int. J. Mol. Sci. 2022, 23, x FOR PEER REVIEW 9 were seen for the contexts of the mutant and wild-type GALE promoters (Figure 4b). results indicate that the variants found in the 5′ UTR of IDUA (c.-87T>C) and PTS (c 71delins-103_-86) in P2 and P4, respectively, are probably responsible for the expre defects seen. In contrast, the variant found in P1 in GALE (c.-77G>C) does not app affect the gene promoter.

Discussion
NGS has had a significant impact on the diagnosis of rare diseases, especially ogeneous genetic diseases and those with unclear phenotypes [23]. Genetic confirm even in IEM, is essential for appropriate genetic counseling and in some cases for th plication of gene-or mutation-specific therapies. Notwithstanding, the diagnosis ra mains much lower (25-50%) than initially expected [18,[24][25][26][27], leaving many pa without genetic confirmation of their condition. This is partly the result of the incap of exome sequencing to identify non-coding variants [2]. The combination of other technologies and functional genomics might, however, help reduce the diagnostic g IEM, as previously described in other works [28].
In IEM, genetic diagnostic technologies can be used with more confidence sin omarkers are available for many disorders, helping to focus genetic analyses. The s tients in the present work had previously been diagnosed clinically and/or bioc cally-but not genetically-as having galactosemia, MPS I, MSUD, HPA, citrullinem OTC deficiency. By combining the DNA and RNA-Seq analyses of specific genes, came possible to identify variants outside of the exomic regions. Indeed, RNA-S lowed the identification of an allelic expression imbalance of GALE, IDUA and DBT P2 and P3, respectively. In addition, in P2, transcriptomic analysis confirmed the eff the novel variant identified in exome sequencing (c.1524+2T>A): an insertion of fou cleotides between exons 10 and 11 in IDUA due to the disruption of the cryptic splicing site of exon 10.
The presence of aberrant transcripts in PTS in P4 helped to direct subsequent yses. Although no expression defect was detected by RNA-Seq, the presence o pseudo-exon insertion in the same number of reads as the normal allele was suggest one. Indeed, a novel 5′ UTR variant was identified by targeted DNA studies, and its on gene expression was confirmed by specific functional analysis.
Unfortunately, RNA-Seq detection suffers from the degradation of out-of-frame scripts by NMD. This was apparent in P3 and P5, in which the novel isoforms were a

Discussion
NGS has had a significant impact on the diagnosis of rare diseases, especially heterogeneous genetic diseases and those with unclear phenotypes [23]. Genetic confirmation, even in IEM, is essential for appropriate genetic counseling and in some cases for the application of gene-or mutation-specific therapies. Notwithstanding, the diagnosis rate remains much lower (25-50%) than initially expected [18,[24][25][26][27], leaving many patients without genetic confirmation of their condition. This is partly the result of the incapacity of exome sequencing to identify non-coding variants [2]. The combination of other omics technologies and functional genomics might, however, help reduce the diagnostic gap in IEM, as previously described in other works [28].
In IEM, genetic diagnostic technologies can be used with more confidence since biomarkers are available for many disorders, helping to focus genetic analyses. The six patients in the present work had previously been diagnosed clinically and/or biochemicallybut not genetically-as having galactosemia, MPS I, MSUD, HPA, citrullinemia or OTC deficiency. By combining the DNA and RNA-Seq analyses of specific genes, it became possible to identify variants outside of the exomic regions. Indeed, RNA-Seq allowed the identification of an allelic expression imbalance of GALE, IDUA and DBT in P1, P2 and P3, respectively. In addition, in P2, transcriptomic analysis confirmed the effect of the novel variant identified in exome sequencing (c.1524+2T>A): an insertion of four nucleotides between exons 10 and 11 in IDUA due to the disruption of the cryptic donor splicing site of exon 10.
The presence of aberrant transcripts in PTS in P4 helped to direct subsequent analyses. Although no expression defect was detected by RNA-Seq, the presence of the pseudo-exon insertion in the same number of reads as the normal allele was suggestive of one. Indeed, a novel 5 UTR variant was identified by targeted DNA studies, and its effect on gene expression was confirmed by specific functional analysis.
Unfortunately, RNA-Seq detection suffers from the degradation of out-of-frame transcripts by NMD. This was apparent in P3 and P5, in which the novel isoforms were almost undetectable by RNA-Seq. However, gene expression analysis revealed a transcription defect in both cases. For these patients, WGS had to be used to identify the deep intronic changes responsible for their diseases. The use of NMD inhibitors might improve the detection of pseudo-exon inclusions. Taken together, these results suggest that, while a gene's expression may not seem altered, nor may any aberrant transcripts be detectable by RNA-Seq, the data collected should be re-analyzed since some alterations can escape such restrictive assessments.
The genomic sequencing of the 5 non-coding regions allowed the identification and prioritization of c.-77G>C in GALE, c.-87T>C in IDUA and c.-82_-71delins-103_-86 in PTS, as candidate variants behind the expression defects in P1, P2 and P4, respectively. Note that variant c.-77G>C was not detected in exome sequencing despite its localization in the first exon of GALE. Since this exon is non-coding, it is not covered by the targets designed for CES. The present results underscore the importance of sequencing all exons (coding or non-coding) in targeted exome sequencing.
When a novel variant is identified, functional studies are necessary to determine if it has a pathogenic effect. The tests needed must be designed ad hoc depending on the proposed effect of the variant. The present work identified variants affecting either expression or splicing. To study the former, a firefly luciferase reporter assay was used, and for the latter, minigene functional studies. These analyses confirmed the pathogenic effect of two variants affecting the expression of IDUA and PTS, and of five variants affecting the splicing process of GALE, PTS, ASS1 and OTC. The c.-77G>C variant found in GALE, predicted to affect splicing, was also found in the promoter region of the gene. The gene expression was not altered by this variant, but its presence in the non-coding exon 1 of GALE affected its splicing, leading to the retention of a part of intron 1. Since this insertion occurs before the AUG translation initiation codon, it should not change the reading frame. It may be that a miss-splicing of intron 1 causes a failure in Transcription-Export (TREX) binding. TREX is the complex responsible for mRNA export from the nucleus to the cytoplasm [29]. Thus, intron retention by GALE would keep the aberrant transcript in the nucleus, precluding its translation and reducing the amount of coded protein [30].
In conclusion, the present work identifies and confirms the pathogenic role of eight new variants in different genes. It also shows the importance of using techniques complementary to WES and demonstrates the diagnostic strength of multi-omics technologies used in combination with functional studies. Finally, in agreement with other studies that endorse the use of WGS over WES, the results highlight the importance of analyzing non-coding regions.

DNA Studies
High-purity DNA was extracted from peripheral blood or from patient-derived fibrob-

Minigene Studies
To examine the splicing pattern in vitro, the pSPL3 vector was used (Exon Trapping System, Gibco, BRL, Carlsbad, CA, USA). Gene fragments corresponding to the intronic sequence containing the pseudo-exon located in intron 1 of PTS, intron 8 of ASS1 and intron 5 of OTC in P4, P5 and P6 were amplified from patient DNA (Figure 3b-d) and cloned into the pGEMT-Easy vector (Promega, Madison, WA, USA); the alleles were isolated. The insertion was excised with EcoRI (Roche Applied Science, Indianapolis, IN, USA), purified using the QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) and subsequently cloned into the pSPL3 vector dephosphorylated with Thermosensitive Alkaline Phosphatase (Promega, Madison, WA, USA). Ligation was performed using the Rapid DNA Ligation Kit (Thermo Fisher Scientific, Waltham, MS, USA). Clones containing the desired normal and mutant insertions were identified by restriction enzyme analysis and automated DNA sequencing. A total of 2 µg of the wild-type or mutant minigene was then transfected into the human hepatic cell line Hep3B (supplied by Dr. S.R. de Córdoba) using JetPEI transfection reagent (Polyplus-Transfection, Illkirch, France) following the manufacturer's protocol. Cells were harvested 48 h post-transfection.
To assess the functional effect of the c.-77G>C change identified in the last nucleotide of the exon 1 of GALE, a hybrid minigene was constructed for transfection experiments. Since the variant is located in the first exon of GALE, and, therefore, no 3 splice-site is present, the minigene was constructed to contain the 3 splice-site of another gene, namely exon 2 of SPR [38]. The entire sequence of GALE exon 1 was amplified using specific primers, adding the restriction targets for PfoI and NheI. The obtained fragment was cloned into the pGEMT-Easy vector (Promega, Madison, WA, USA), excised with PfoI (New England Biolabs, Beverly, MA, USA) and NheI (Roche Applied Science, Indianapolis, IN, USA), purified using the QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) and then cloned into the pSPL3-SPR vector (previously generated at our laboratory) using the Rapid DNA Ligation Kit (Thermo Fisher Scientific, Waltham, MS, USA) (Figure 3a). The clone selection and transfection were performed as described for the PTS, ASS1 and OTC minigenes.
The selected regions were amplified using specific primers, adding the restriction targets for KpnI and BglII. Fragments were then cloned into the pGEMT-Easy vector (Promega, Madison, WA, USA) to separate the alleles. The insertion was excised with KpnI (New England Biolabs, Beverly, MA, USA) and BglII (Roche Applied Science, Indianapolis, IN, USA), purified using the QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) and then cloned into the pGL3-Basic vector (Promega, Madison, WA, USA) using the Rapid DNA Ligation Kit (Thermo Fisher Scientific, Waltham, MS, USA). Clones containing the desired normal and mutant insertions were identified by restriction enzyme analysis and automated DNA sequencing.
The human hepatic cell line Hep3B (supplied by Dr. S.R. de Córdoba) was then cotransfected with 1.8 µg of the wild-type or mutant construct in addition to 0.2 µg pRL vector (supplied by Dr. I. Ventoso) using JetPEI transfection reagent (Polyplus-Transfection, Illkirch, France) following the manufacturer's indications. Cells were harvested 48 h post-transfection.
Firefly and Renilla reniformis luciferase activities were assessed using the Dual-Luciferase Reporter Assay System (Promega, Madison, WA, USA) following the manufacturer's indica-tions, and detected using a FLUOstar OPTIMA microplate reader (BMG Labtech, Durham, NC, USA).

X-Chromosome Inactivation Studies (HUMARA Assay)
A fragment of the AR gene containing a highly polymorphic short tandem repeat was amplified using specific fluorescent primers and FastStart Taq DNA polymerase (Roche Applied Science, Indianapolis, IN, USA). The amplicon size of the patient, paternal and maternal samples was analyzed by capillary electrophoresis using a 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA). The results were processed using Peak Scanner Software 2 v.2.0 (Applied Biosystems, Foster City, CA, USA). Patient blood DNA was digested using either HpaII (Thermo Fisher Scientific, Waltham, MS, USA), MspI (Thermo Fisher Scientific) or a mock enzyme, PCR amplification was repeated and the amplicon size was determined in the same manner as above. The area under the peaks of HpaII and the mock-treated samples were compared to obtain the methylation percentage of each allele.

Statistical Analysis
Statistical analyses were performed using GraphPad Prism 6 software (GraphPad Software, La Jolla, CA, USA) for Windows. Student's t-test was used for comparison between groups. The significance was set at p < 0.05.