Exome Survey and Candidate Gene Re-Sequencing Identifies Novel Exstrophy Candidate Genes and Implicates LZTR1 in Disease Formation

Background: The bladder exstrophy-epispadias complex (BEEC) is a spectrum of congenital abnormalities that involves the abdominal wall, the bony pelvis, the urinary tract, the external genitalia, and, in severe cases, the gastrointestinal tract as well. Methods: Herein, we performed an exome analysis of case-parent trios with cloacal exstrophy (CE), the most severe form of the BEEC. Furthermore, we surveyed the exome of a sib-pair presenting with classic bladder exstrophy (CBE) and epispadias (E) only. Moreover, we performed large-scale re-sequencing of CBE individuals for novel candidate genes that were derived from the current exome analysis, as well as for previously reported candidate genes within the CBE phenocritical region, 22q11.2. Results: The exome survey in the CE case-parent trios identified two candidate genes harboring de novo variants (NR1H2, GKAP1), four candidate genes with autosomal-recessive biallelic variants (AKR1B10, CLSTN3, NDST4, PLEKHB1) and one candidate gene with suggestive uniparental disomy (SVEP1). However, re-sequencing did not identify any additional variant carriers in these candidate genes. Analysis of the affected sib-pair revealed no candidate gene. Re-sequencing of the genes within the 22q11.2 CBE phenocritical region identified two highly conserved frameshift variants that led to early termination in two independent CBE males, in LZTR1 (c.978_985del, p.Ser327fster6) and in SLC7A4 (c.1087delC, p.Arg363fster68). Conclusions: According to previous studies, our study further implicates LZTR1 in CBE formation. Exome analysis-derived candidate genes from CE individuals may not represent a frequent indicator for other BEEC phenotypes and warrant molecular analysis before their involvement in disease formation can be assumed.


Introduction
The bladder exstrophy-epispadias complex (BEEC; OMIM %600057) characterizes a spectrum of human congenital anomalies comprising malformations of the urinary tract and the genitalia, pelvis, abdominal wall, and, occasionally, the spine and gastrointestinal tract [1,2]. The BEEC encompasses a vast severity spectrum incorporating different phenotypes, including epispadias (E) as the mildest phenotype, classic bladder exstrophy (CBE) as the intermediate and most common form, and cloacal exstrophy (CE) as the most severe phenotype [3]. CE is also referred to as OEIS (omphalocele, exstrophy, imperforate anus, and spinal defects) complex [4]. Epidemiological studies state the incidence rates at about 2.4:100,000 births for E, 1-2:50,000 births for CBE, and 0.5-1:200,000 births for CE [1], with an overall birth prevalence of 1:10,000 in children of European descent [3]. A male-to-female ratio ranging from 1.5:1 to 6:1 [5] is reported, with CBE being more frequent in males and CE being more common in females [6]. Additional anomalies of the urinary tract, such as an ectopic kidney, a horseshoe kidney, renal hypo-or agenesis, and ureteropelvic junction obstruction are present in one-third of all cases, mainly in the form of the CE phenotype [2,5]. Both sexes are affected by impaired sexual function and fertility issues [2,[7][8][9]. Fertility in men is decreased due to low ejaculation volumes and poor sperm quality [8,9].
Multiple findings suggest that genetic factors play an important role in BEEC etiology: (i) increased recurrence risk for the siblings of CBE individuals [5,10,11], (ii) increased recurrence risk for the offspring of affected individuals [11], (iii) a higher concordance rates among monozygotic compared to dizygotic twins [12], and (iv) the report of several multiplex families in the literature [13]. Previously, candidate genes for monogenic forms were identified through array-based molecular karyotyping [14] and exome analysis [3]. Using exome analysis, we recently identified SLC20A1 as a monoallelic candidate gene for CE [15]. To identify further candidate genes, we performed exome analysis in 14 CE case-parent trios and in one affected sib-pair (CBE and epispadias only). To prioritize the identified candidate genes, we re-sequenced 480 BEEC individuals for the prioritized candidate genes. Furthermore, we re-sequenced previously reported candidate genes within the CBE phenocritical region, 22q11.2 [14]. Based on our findings, we suggest that the gene LZTR1 is involved in CBE formation.

Individuals
Exome analysis was performed in 14 CE case-parent trios and 1 affected sib-pair presenting with CBE and epispadias only. The complete re-sequencing cohort comprised 480 BEEC individuals (310 males and 169 females; in 1 case, the gender was unknown). Ethical consent was obtained by the Ethics Committee of the Medical Faculty of the University of Bonn (Lfd.Nr.031/19). Written informed consent was provided by all participating families prior to the study.

DNA Preparation and Exome Sequencing
The DNA of individuals and their parents was extracted from saliva samples, using the Oragene DNA Kit (DNA Genotek Inc., Ottawa, ON, Canada), or from blood samples using the Chemagic Magnetic Separation Module I (Cheagen, Baesweiler, Germany). In the first step, exome sequencing was performed for 14 CE case-parent trios and 1 affected CBE and E sib-pair at the Next-Generation Sequencing Laboratory of the Cologne Center for Genomics (CCG), using the Agilent SureSelect Human All Exon V6 for 12 families, the NimbleGen SeqCap EZ Human Exome Library v 2.0 for 2 families, and the Agilent SureSelect All Exon V7 for 1 family. After validation (2200 TapeStation; Agilent Technologies, Santa Clara, CA, USA) and quantification (Qubit System; Invitrogen, Waltham, MA, USA), pools of libraries were generated and subsequently sequenced on the Illumina HiSeq 4000 sequencing instrument, using a paired-end 2 × 100 bp protocol. The mean coverage of the presented exome data was 70,933 reads. In total, 92.31% of the targeted bases were covered by at least 20×. Since we filtered the reads with a minimum coverage of 10×, all the prioritized variants were validated using Sanger sequencing. Data analysis and filtering of the mapped target sequences were accomplished using the "Varbank" exome and genome analysis pipeline, version 2.0 (https://varbank.ccg.uni-koeln.de, accessed on 1 May 2000).

Molecular Inversion Probe (MIP) Assay and Sanger Validation
In the second step, prioritized candidate genes were re-sequenced using a "Molecular Inversion Probe (MIP) Assay" for the entire BEEC cohort. We included the 7 candidate genes prioritized in our exome analysis, along with 7 candidate genes that were previously described in the 22q11.2 CBE phenocritical region [14,30,31]. To cover all 192 protein-coding-transcripts of these 14 candidate genes, 335 MIPs were designed with amplicon lengths of between 165 and 189 bp (see Table S1 in the Supplementary Materials), using an in-house version of the MIPgen tool [32]. Three balancing runs and one re-balancing run of the MIPs were performed using the MiSeq ® with Reagent Kit v2 (Illumina, San Diego, CA, USA). The final pooled MIP libraries were then sequenced with the NovaSeq 6000 ® SP XP-Workflow Reagent Kit v1.5 (300 cycles), using 2 × 125 bp. A Q30-Score of 82.93% was reached. The sequencing identified 1252 variants, of which only those variants that were covered by more than 10 reads were considered. To prioritize these variants, we applied the filter algorithm described above. Final validation of the remaining variants found in individuals and their parents was performed via Sanger sequencing with an ABI 3730 XL DNA analyzer (Life Technologies/Thermofisher, Schwerte, Germany) by Azenta Life Science.

Exome Analysis
Filtering of the CE case-parent exome data identified seven candidate genes. These genes comprised two candidate genes with autosomal-dominant monoallelic de novo variants (NR1H2, GKAP1), three candidate genes with autosomal-recessive biallelic compound heterozygous variants (CLSTN3, AKR1B10, NDST4), one candidate gene with an autosomalrecessive biallelic homozygous variant (PLEKHB1), and one candidate gene with suggestive uniparental disomy (SVEP1). All variants were validated by Sanger sequencing. Exome analysis of one affected sib-pair did not identify any plausible variants/candidate genes.
Individual 420_501 was found to carry a novel de novo heterozygous missense variant, c.737A>C (p.Gln246Pro), in exon 8 (Ensembl GRChr37/hg19 transcript ENST00000376371.7) of GKAP1. However, SpliceAI predicts a high chance of donor loss for this variant so it might not be a missense but, instead, a splicing variant.
The variant was predicted to be deleterious in SIFT and benign in PolyPhen-2, with a CADD13-PHRED score of 26.1. The aa is evolutionarily conserved down to Mm and Gg (Gln), while Xt and Dr show unalienable bases in the gap region. Our in-house murine transcriptome database of relevant uro-rectal tissue shows expression over the whole time period, with an upregulation in E15.5: log2FoldChange (15.5) of 0.99659903 and a p-value (15.5) of 3.1474 × 10 −8 , with a baseMean (15.5) of 949.642679. GKAP1 has not been associated with any human disease phenotype so far (Table 1). Individual 420_501 was found to also carry a novel de novo heterozygous missense variant c.718C>T (p.Arg240Cys) in exon 6 (Ensembl GRChr37/hg19 transcript ENST00000253727.10) of NR1H2. The variant was predicted to be deleterious in SIFT and probably damaging in PolyPhen-2, with a CADD13-PHRED score of 28.5. The aa is highly conserved down to Mm, Gg, Xt, and Dr. Our in-house murine transcriptome database did not show any expression of this variant during the embryonic stages of E10.5, E.12.5, and E15.5. NR1H2 has not been associated with any human disease phenotype so far (Table 1) We identified one autosomal-recessive candidate gene, PLEKHB1, for which individual 420_501 carried a homozygous missense variant, c.76G>A (p.Gly26Ser) in exon 2 of 8 (Ensembl GRChr37/hg19 transcript ENST00000354190.10) ( Table 2). It was predicted to be deleterious by SIFT and probably damaging by PolyPhen-2, with a CADD13_PHRED score of 28.4. The aa is conserved to Mm. Our in-house murine transcriptome database shows the continuous expression of this variant during the embryonic stages E10.5, E.12.5, and E15.5. The protein has only one annotated domain (aa21-128) and the variant lies at the beginning of this domain. PLEKHB1 has not been associated with any disease phenotype so far.

2.2)
Probably damaging (1) CADD13_PHRED v1. 6 28.4  AKR1B10 represents a human NADPH-dependent reductase belonging to the aldo-keto reductase (AKR) 1B subfamily. The enzyme is highly expressed in the epithelial cells of the stomach and intestine [33]. Interestingly, the disruption of NAD synthesis has previously been associated with congenital malformations in humans and mice [34]; however, AKR1B10 has not been associated with any human disease phenotype so far.   Table 4). The variant was predicted to be tolerated by SIFT and probably damaging by PolyPhen-2, with a CADD13_PHRED score of 23.8. The aa is evolutionarily conserved over Mm, Gg, Xt, and Dr (Thr). Our in-house murine transcriptome database shows the continuous expression of Svep1 during the embryonic stages E10.5, E.12.5, and E15.5. SVEP1 has not been associated with any human disease phenotype so far. A literature search revealed that the Svep1 homozygous mutant embryos display multiple defects, such as edema, but also show abnormal development of the kidney and pelvis at E15.5 and E18.5 [35].

Primary phenotype CE
All information about the transcript refers to the canonical transcript. 1 Exon: the number before the slash indicates the affected exon. The number after the slash indicates the number of exons in this gene.
Overall, none of the variants found in our exome analysis can be classified as pathogenic, according to the ACMG classification criteria [36]. Hence, all variants that were prioritized and validated with Sanger sequencing should currently be interpreted as a variant of uncertain (or unknown) significance (VUS).

MIP Assay
To investigate the overall contribution of the above-mentioned candidate genes to BEEC, we re-sequenced those genes identified through exome analysis (NR1H2, GKAP1, CLSTN3, AKR1B10, NDST4, PLEKHB1, and SVEP1) and genes from the CBE phenocritical region, 22q11.2 (LZTR1, SLC7A4, AIFM3, SNAP29, THAP7, P2RX6, CRKL) in a cohort of 480 BEEC individuals. As outlined earlier, none of the individuals included in the MIP assay were included in our prior exome analysis.
Unfortunately, we did not identify any additional putative disease variants in the above-mentioned exome analysis-derived candidate genes. However, we identified two putative disease-causing variants in LZTR1 and SLC7A4 (Table 5). Sanger sequencing validated both variants. Due to a lack of paternal DNA and additional information such as family history, it remains unclear whether the variants did or did not occur de novo. Neither mother carried the respective variant. Both frameshift variants led to an early termination (Figure 1). Individual 136_501 presented with CBE and carried a heterozygous frameshift variant, c.978_985del (p.Ser327ter6), in exon 9 (Ensembl GRChr37/hg19 transcript ENST00000215739.8) of LZTR1. The protein consists of six Kelch repeats and two BTB domains. The variant leads to a frameshift that affects the Kelch 5 (295aa-341aa) repeat and leads to an early termination. Our in-house murine transcriptome database shows continuous expression throughout the embryonic stages E10.5, E.12.5, and E15.5.
Dominant variants in LZTR1 have been associated with Noonan syndrome 10 (OMIM #616564) and Noonan syndrome 2 (OMIM #605275) and susceptibility to Schwannomato-sis (OMIM #615670). However, more interestingly, the deletion of LZTR1 has been associated with the formation of congenital anomalies of the kidneys and urinary tract (CAKUT) [14,30,31,37,38]. LZTR1 (c.978_985del, p.Ser327fster6) fulfills the ACMG criteria of pathogenicity (PVS1, PM2) [36]. However, this frameshift has previously been associated with Schwannomatosis and cardiovascular phenotypes (VCV001709123.4) and not with CBE, leaving some uncertainty about the involvement of this variant in CBE formation. has not been associated with any human disease phenotype so far. SLC7A4 (c.1087delC, p.Arg363fster68) fulfills the ACMG criteria of a VUS in the current context (PM2) [36]. In particular, the lack of functional data and the lack of proof of a de novo occurrence of this variant suggests this variant to be VUS.

Discussion
Here, we used exome analysis with CE case-parent trios and large-scale re-sequencing for the identification of novel candidate genes and putative disease variants in the identified candidate genes. The exome survey in the CE case-parent trios identified two candidate genes harboring de novo variants (NR1H2 and GKAP1), four candidate genes with autosomal-recessive biallelic variants (AKR1B10, CLSTN3, NDST4, and PLEKHB1), and one candidate gene with suggestive uniparental disomy (SVEP1). However, re-sequencing did not identify any additional variant carriers in these candidate genes.
Hitherto, 22q11.2 microduplication has been the only genetic risk factor that has been found to be significantly enriched among CBE individuals [1,14,30,31]. This finding prompted us to re-sequence CRKL, LZTR1, THAP7, SLC7A4, AIFM3, SNAP29, and P2RX6, which reside in the 22q11.2 phenocritical region. We thereby discovered two possible disease-causing frameshift variants that lead to early termination in LZTR1 and SLC7A4.
In a male CBE individual, we identified a frameshift variant in LZTR1 (c.978_985del, p.Ser327fster6). LZTR1 is thought to be involved in a variety of inherited and acquired human disorders [39] and has been highlighted as a candidate gene for urogenital malformations [30,31,40,41]. LZTR1 belongs to the BTB-Kelch superfamily, which play important roles during fundamental cellular processes, such as the regulation of gene expression, cell morphology, and migration [31], which are highly conserved during evolution [30]. Ubiquitous expression in mice (E9.5) has been shown in a previous study [30] and our in-house murine transcriptome database underlines these findings. This further reinforces LZTR1 as a potential candidate gene for the BEEC phenotype.
Most recently, Lundin et al. found a novel variant (p.Ser698Phe) in LZTR1 in one BEEC individual. Functional evaluation of the LZTR1 p.Ser698Phe variant in live NIH 3T3 cells showed that the concentration and cytoplasmic mobility differ between Lztr1-wt and Lztr1-mut, indicating the potential functional effect of LZTR1-Mut [31].
This information supports our finding and suggests LZTR1 to be a strong candidate gene for CBE formation, warranting the functional characterization of all those LZTR1 variants that have been described in association with CBE.
In a second male individual with CBE, we identified a frameshift variant in SLC7A4 (c.1087delC, p.Arg363fster68). SLC7A4 belongs to a family of cationic amino acid transporters. All exons of SLC7A1, SLC7A2, and SLC7A4 are of similar or equal length; analysis of exon 3 of SLC7A4 shows corresponding exons in SLC7A1 and SLC7A2, which suggests that it may encode an important functional or regulatory domain [42]. Contrary to earlier reported findings of Slc7a4 expression at E9.5 (the transcript was present but there was no expression) [30], the expression of Slc7a4 in our in-house transcriptome database of uro-rectal tissue was continuous and upregulated during E15.5. While gains or losses in the chromosomal region 22q11.2, encompassing SLC7A4 (OMIM *603752), have been associated with CAKUT or BEEC, no single base variant in SLC7A4 has been associated with the human CAKUT or BEEC disease phenotypes so far.

Conclusions
An exome survey of case-parent trios with CE has identified novel candidate genes. These novel candidate genes require further investigation via larger re-sequencing analysis or functional in vitro or in vivo studies to support their involvement in the development of this disease before they can be annotated as BEEC-associated candidate genes. The re-sequencing of all genes residing in the CBE phenocritical region 22q11.2 has provided further support for LZTR1 being implicated in CBE formation and suggests SLC7A4 as a potential novel CBE candidate gene.

Data Availability Statement:
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. Ranking score rather than prediction or default cut-off; higher scores more likely to be deleterious. Scaled score: 10 = top 10% of all reference genome SNVs 20 = top 1% SNVs 30 = 0.1% of most deleterious possible substitutions in the human genome